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(57) The present invention provides polynucleotide 
sequences of the genome of Staphylococcus aureus, 
polypeptide sequences encoded by the polynucleotide 
sequences, corresponding polynucleotides and 
polypeptides, vectors and hosts comprising the polynu- 



cleotides, and assays and other uses thereof. The 
present invention further provides polynucleotide and 
polypeptide sequence information stored on computer 
readable media, and computer-based systems and 
methods which facilitate its use. 
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Joint infections 

S. aureus infects bone joints causing diseases such osteomyelitis. 
Osteomyelitis 



S. aureus is the most common causative agent of haematogenous osteomyelitis. The disease 
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children and adolescents more than adults and it is associated with non-penetrating injuries to bones. Infection typically 
occurs in the long end of growing bone, hence its occurrence in physically immature populations. Most often, infection 
is localized in the vicinity of sprouting capillary loops adjacent to epiphysial growth plates in the nd of long, growing 
bones. 

s 

Skin infections 

S. aureus is the most common pathogen of such minor skin infections as abscesses and boils. Such infections 
often are resolved by normal host response mechanisms, but they also can develop into severe internal infections. 
10 Recurrent infections of the nasal passages plague nasal carriers of S. aureus. 

Surgical Wound Infections 

Surgical wounds often penetrate far into the body. Infection of such wound thus poses a grave risk to the patient. 
is s. aureus is the most important causative agent of infections in surgical wounds. S. aureus is unusually adept at 
invading surgical wounds; sutured wounds can be infected by far fewer S. aureus cells then are necessary to cause 
infection in normal skin. Invasion of surgical wound can lead to severe S. aureus septicaemia. Invasion of the blood 
stream by S. aureus can lead to seeding and infection of internal organs, particularly heart valves and bone, causing 
systemic diseases, such as endocarditis and osteomyelitis, 

so 

Scalded Skin Syndrome 

S. aureus is responsible for "scalded skin syndrome" (also called toxic epidermal necrosis, Ritter's disease and 
Lyell's disease). This diseases occurs in older children, typically in outbreaks caused by flowering of S. aureus strains 
25 produce exfoliation (also called scalded skin syndrome toxin). Although the bacteria initially may infect only a minor 
lesion, the toxin destroys intercellular connections, spreads epidermal layers and allows the infection to penetrate the 
outer layer of the skin, producing the desquamation that typifies the diseases. Shedding of the outer layer of skin 
generally reveals normal skin below, but fluid lost in the process can produce severe injury in young children if it is not 
treated properly. 

30 

Toxic Shock Syndrome 

Toxic shock syndrome is caused by strains of S. aureus that produce the so-called toxic shock syndrome toxin. 
The disease can be caused by S. aureus infection at any site, but it is too often erroneously viewed exclusively as a 
3S disease solely of women who use tampons. The disease involves toxaemia and septicaemia, and can be fatal. 

Nocosomiai Infections 

In the 1984 National Nocosomiai Infection Surveillance Study ("NNIS") S. aureus was the most prevalent agent 
40 of surgical wound infections in many hospital services, including medicine, surgery, obstetrics, pediatrics and newborns. 

Resistance to drugs of S. aureus strains 

Prior to the introduction of penicillin the prognosis for patients seriously infected with S. aureus was unfavorable. 
45 Following the introduction of penicillin in the early 1 940s even the worst S. aureus infections generally could be treated 
successfully. The emergence of penicillin-resistant strains of S. aureus did not take long, however. Most strains of S. 
aureus encountered in hospital infections today do not respond to penicillin; although, fortunately, this is not the case 
for S. aureus encountered in community infections. 

It is well known now that penicillin-resistant strains of S. aureus produce a lactamase which converts penicillin to 
so pencillinoic acid, and thereby destroys antibiotic activity. Furthermore, the lactamase gene often is propagated episo- 
mally, typically on a plasmid, and often is only one of several genes on an episomal element that, together, confer 
multidrug resistance. 

Methicillins, introduced in the 1960s, largely overcame the problem of penicillin resistance in S. aureus. These 
compounds conserve the portions of penicillin responsible for antibiotic activity and modify or alter other portions that 
55 make penicillin a good substrate for inactivating lactamases. However, methicillin resistance has emerged in S. aureus, 
along with resistance to many other antibiotics effective against this organism, including aminoglycosides, tetracycline, 
chloramphenicol, macrolides and lincosamides. In fact, methicillin-resistant strains of S. aureus generally are multiply 
drug resistant. 
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Molecular Genetics of Staphylococcus Aureus 

10 

or^lT U imP ° rtanCe am ° n9 0thSr thi09S ' hUma " diSeaSe ' - -own about the genome o, this 

Most genetic studies of S. aureus have been carried out using the the strain NCTCSaP«; . • 

prcptagn plMmkte. ttnSSS™^" ^ V8nal " e °°°° as °* *■»««•■ ««* - 

containing a Smal recognition sequence introduced into the chromosome us.ng a transposon 

organization would dramatically improve understanding of disease etfctoT«Sl«iT ^ ^ d genom,c 

preventing, ameliorating, arresting and reversing dTsSe . S^r^SSSiS^JZ^ ^ °' 
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information herein described stored in a data storage means. Such systems are designed to identify commercially 
important fragments of the Staphylococcus aureus genome. 

Another embodiment of the present invention is directed to fragments, pref rably isolated fragments, of the Sta- 
phylococcus aureus genome having particular structural or functional attributes. Such fragments of the Staphylococcus 
6 aureus genome of the present invention include, but are not limited to, fragments which encode peptides, hereinafter 
referred to as open reading frames or ORFs,' fragments which modulate the expression of an operabty linked ORF, 
hereinafter referred to as expression modulating fragments or EMFs," and fragments which can be used to diagnose 
the presence of Staphylococcus aureus in a sample, hereinafter referred to as diagnostic fragments or "DFS.* 

Each of the ORFs in fragments of the Staphylococcus aureus genome disclosed in Tables 1-3, and the EMFs 
10 found 5' to the ORFs, can be used in numerous ways as polynucleotide reagents. For instance, the sequences can be 
used as diagnostic probes or amplification primers for detecting or determining the presence of a specific microbe in 
a sample, to selectively control gene expression in a host and in the production of polypeptides, such as polypeptides 
encoded by ORFs of the present invention, particular those polypeptides that have a pharmacological activity. 

The present invention further includes recombinant constructs comprising one or more fragments of the Staphy- 
1$ Jococcus aureus genome of the present invention. The recombinant constructs of the present invention comprise vec- 
tors, such as a plasmid or viral vector, into which a fragment of the Staphylococcus aureus has been inserted. 

The present invention further provides host cells containing any of the isolated fragments of the Staphylococcus 
aureus genome of the present invention. The host cells can be a higher eukaryotic host cell, such as a mammalian 
cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a bacterial cell. 
20 The present invention is further directed to polypeptides and proteins, preferably isolated polypeptides and pro- 

teins, encoded by ORFs of the present invention. A variety of methods, well known to those of skill in the art, routinely 
may be utilized to obtain any of the polypeptides and proteins of the present invention. For instance; polypeptides and 
proteins of the present invention having relatively short, simple amino acid sequences readily can be synthesized using 
commercially available automated peptide synthesizers. Polypeptides and proteins of the present invention also may 
26 be purified from bacterial cells which naturally produce the protein. Yet another alternative is to purify polypeptide and 
proteins of the present invention can from cells which have been altered to express them. 

The invention further provides polypeptides, preferably isolated polypeptides, comprising Staphylococcus aureus 
epitopes and vaccine compositions comprising such polypeptides. Also provided are methods for vacciniating an in- 
dividual against Staphylococcus aureus infection. 
30 The invention further provides methods of obtaining homologs of the fragments of the Staphylococcus aureus 

genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention. Specif- 
ically, by using the nucleotide and amino acid sequences disclosed herein as a probe or as primers, and techniques 
such as PGR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs. 

The invention further provides antibodies which selectively bind polypeptides and proteins of the present invention. 
35 Such antibodies include both monoclonal and polyclonal antibodies. 

The invention further provides hybridomas which produce the above-described antibodies. A hybridoma is an 
immortalized cell line which is capable of secreting a specific monoclonal antibody. 

The present invention further provides methods of identifying test samples derived from cells which express one 
of the ORFs of the present invention, or a homolog thereof. Such methods comprise incubating a test sample with one 
40 or more of the antibodies of the present invention, or one or more of the Dfs or antigens of the present invention, under 
conditions which allow a skilled artisan to determine if the sample contains the ORF or product produced therefrom. 

In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry 
out the above-described assays. 

Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers 
45 which comprises: (a) a first container comprising one of the antibodies, antigens, or one of the DFs of the present 
invention; and (b) one or more other containers comprising one or more of the following: wash reagents, reagents 
capable of detecting presence of bound antibodies, antigens or hybridized DFs. 

Using the isolated proteins of the present invention, the present invention further provides methods of obtaining 
and identifying agents capable of binding to a polypeptide or protein encoded by one of the ORFs of the present 
so invention. Specifically, such agents include, as further described below, antibodies, peptides, carbohydrates, pharma- 
ceutical agents and the like. Such methods comprise steps of: (a)contacting an agent with an isolated protein encoded 
by one of the ORFs of the present invention; and (b determining whether the agent binds to said protein. 

The present genomic sequences of Staphylococcus aureus will be of great value to all laboratories working with 
this organism and for a variety of commercial purposes. Many fragments of the Staphylococcus aureus genome will 
55 be immediately identified by similarity searches against GenBank or protein databases and will be of immediate value 
to Staphylococcus aureus researchers and for immediate commercial value for the production of proteins or to control 
gene expression. 

The methodology and technology for elucidating extensive genomic sequences of bacterial and other genomes 
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has and will greatly enhance the ability to analyze and understand chromosomal organization. In particular, sequenced 
contigs and genom s will provide the models for developing tools for the analysis of chromosome structure and function, 
including the ability to identify genes within large segments of genomic DNA, the structure, position, and spacing of 
regulatory elements, the identification of genes with potential industrial applications, and the ability to do comparative 
5 genomic and molecular phytogeny 

FIGURE 1 is a block diagram of a computer system (102) that can be used to implement computer-based systems 
of present invention. 

FIGURE 2 is a schematic diagram depicting the data flow and computer programs used to collect, assemble, edit 
and annotate the contigs of the Staphylococcus aureus genome of the present invention. Both Macintosh and Unix 

io platforms are used to handle the AB 373 and 377 sequence data files, largely as described in Kerlavage et ai, Pro- 
ceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences, 585, IEEE Computer So- 
ciety Press, Washington D.C. (1993). Factura (AB) is a Macintosh program designed for automatic vector sequence 
removal and end-trimming of sequence files. The program Loadis runs on a Macintosh platform and parses the feature 
data extracted from the sequence files by Factura to the Unix based Staphylococcus aureus relational database. As- 

is sembfy of contigs (and whole genome sequences) is accomplished by retrieving a specific set of sequence files and 
their associated features using extrseq, a Unix utility for retrieving sequences from an SQL database. The resulting 
sequence file is processed by seq_filter to trim portions of the sequences with more than 2% ambiguous nucleotides. 
The sequence files were assembled using TIGR Assembler, an assembly engine designed at The Institute for Genomic 
Research ( TIGR") for rapid and accurate assembly of thousands of sequence fragments. The collection of contigs 

20 generated by the assembly step is loaded into the database with the lassie program. Identification of open reading 
frames (ORFs) is accomplished by processing contigs with zorf. The ORFs are searched against S. aureus sequences 
from Gen bank and against all protein sequences using the BLASTN and BLASTP programs, described in Altschul et 
at, J. Mol. Biol 215 : 403-410 (1990)). Results of the ORF determination and similarity searching steps were loaded 
into the database. As described below, some results of the determination and the searches are set out in Tables 1 -3.. 

2S The present invention is based on the sequencing of fragments of the Staphylococcus aureus genome and analysis 

of the sequences. The primary nucleotide sequences generated by sequencing the fragments are provided in SEQ ID 
NOS: 1-5,1 91 . (As used herein, the "primary sequence" refers to the nucleotide sequence represented by the IUPAC 
nomenclature system.) 

In addition to the aforementioned Staphylococcus aureus polynucleotide and polynucleotide sequences, the 
30 present invention provides the nucleotide sequences of SEQ ID NOS: 1 -5,1 91 ; or representative fragments thereof, in 
a form which can be readily used, analyzed, and interpreted by a skilled artisan. 

As used herein, a "representative fragment of the nucleotide sequence depicted in SEQ ID NOS:1-5,191 " refers 
to any portion of the SEQ ID NOS: 1-5,191 which is not presently represented within a publicly available database. 
Preferred representative fragments of the present invention are Staphylococcus aureus open reading frames ( ORFs"), 
35 expression modulating fragment ( EMFs") and fragments which can be used to diagnose the presence of Staphyloco- 
ccus aureus in sample ("DFs"). A non-limiting identification of preferred representative fragments is provided in Tables 
1-3. 

As discussed in detail below, the information provided in SEQ ID NOS:1 -5,191 and in Tables 1-3 together with 
routine cloning, synthesis, sequencing and assay methods will enable those skilled in the art to clone and sequence 
40 all "representative fragments" of interest, including open reading frames encoding a large variety of Staphylococcus 
aureus proteins: 

While the presently disclosed sequences of SEQ ID NOS:1 -5, 1 91 are highly accurate, sequencing techniques are 
not perfect and, in relatively rare instances, further investigation of a fragment or sequence of the invention may reveal 
a nucleotide sequence error present in a nucleotide sequence disclosed in SEQ ID NOS:1-5,191. However, once the 

45 present invention is made available (i.e., once the information in SEQ ID NOS:1-5,191 and Tables 1-3 has been made 
available), resolving a rare sequencing error in SEQ ID NOS:1-5,191 will be weft within the skill of the art. The present 
disclosure makes available sufficient sequence information to allow any of the described contigs or portions thereof to 
be obtained readily by straightforward application of routine techniques. Further sequencing of such polynucleotide 
may proceed in like manner using manual and automated sequencing methods which are employed ubiquitous in the 

50 art. Nucleotide sequence editing software is publicly available. For example, Applied Biosystem's (AB) AutoAssembler 
can be used as an aid during visual inspection of nucleotide sequences. By employing such routine techniques potential 
errors readily may be identified and the correct sequence then may be ascertained by targeting further sequencing 
effort, also of a routine nature, to the region containing the potential error 

Even if all of the very rare sequencing errors in SEQ ID NOS:1-5,191 were corrected, the resulting nucleotide 

55 sequences would still be at least 95% identical, nearly all would be at least 99% identical, and the great majority would 
be at least 99.9% identical to the nucleotide sequences of SEQ ID NOS:1 -5,191 . 

As discussed elsewhere hererin, polynucleotides of the present invention readily may be obtained by routine ap- 
plication of well known and standard procedures for cloning and sequencing DNA. Detailed methods for obtaining 
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libraries and for sequencing are provided below, for instance. A wide variety of Staphylococcus aureus strains that can 
be used to prepare S aureus genomic DNA for cloning and for obtaining polynucleotides of the present invention are 
available to the public from recognized depository institutions, such as the American Type Culture Collection (ATCC"). 

The nucleotide sequences of the genomes from different strains of Staphylococcus aureus differ somewhat. How- 
s ever, the nucleotide sequences of the genomes of all Staphylococcus aureus strains will be at least 95% identical, in 
corresponding part, to the nucleotide sequences provided in SEQ ID NOS:1-5 ( 191. Nearly all will be at least 99% 
identical and the great majority will be 99.9% identical. 

Thus, the present invention further provides nucleotide sequences which are at least 95%, preferably 99% and 
most preferably 99.9% identical to the nucleotide sequences of SEQ ID NOS:1-5,191, in a form which can be readily 
10 used, analyzed and interpreted by the skilled artisan. 

Methods for determining whether a nucleotide sequence is at least 95%, at least 99% or at least 99.9% identical 
to the nucleotide sequences of SEQ ID NOS: 1 -5, 1 91 are routine and readily available to the skilled artisan. For example, 
the well known fasta algorithm described in Pearson and Lipman, Proa Natl. Acad. Sci. USA85: 2444 (1988) can be 
used to generate the percent identity of nucleotide sequences. The BLASTN program also can be used to generate 
is an identity score of polynucleotides compared to one another. 

COMPUTER RELATED EMBODIMENTS 

The nucleotide sequences provided in SEQ ID NOS.1-5,191, a representative fragment thereof, or a nucleotide 
zo sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical to a polynucleotide se- 
quence of SEQ ID NOS:1 -5,191 may be "provided" in a variety of mediums to facilitate use thereof. As used herein, 
6provided" refers to a manufacture, otherthan an isolated nucleic acid molecule, which contains a nucleotide sequence 
of the present invention; i.e., a nucleotide sequence provided in SEQ ID NOS:1-5,191, a representative fragment 
thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical 
25 to a polynucleotide of SEQ I D NOS: 1-5,191. Such a manufacture provides a large portion of the Staphylococcus aureus 
genome and parts thereof {e.g., a Staphylococcus aureus open reading frame (ORF)) in a form which allows a skilled 
artisan to examine the manufacture using means not directly applicable to examining the Staphylococcus aureus ge- 
nome or a subset thereof as it exists in nature or in purified form. 

In one application of this embodiment, a nucleotide sequence of the present invention can be recorded on computer 
30 readable media. As used herein, "computer readable media" refers to any medium which can be read and accessed 
directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard 
disc storage medium, and magnetic tape; optical storage media such as CD- ROM; electrical storage media such as 
RAM and^ROM; and hybrids of these categories, such as magnetic/optical storage media. A skilled artisan can readily 
appreciate how any of the presently known computer readable mediums can be used to create a manufacture com- 
as prising computer readable medium having recorded thereon a nucleotide sequence of the present invention. Likewise, 
it will be clear to those of skill how additional computer readable media that may be developed also can be used to 
create analogous manufactures having recorded thereon a nucleotide sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled 
artisan can readily adopt any of the presently know methods for recording information on computer readable medium 
40 to generate manufactures comprising the nucleotide sequence information of the present invention. 

A variety of data storage structures are available to a skilled artisan for creating a computer readable medium 
having recorded thereon a nucleotide sequence of the present invention. The choice of the data storage structure will 
generally be based on the means chosen to access the stored information. In addition, a variety of data processor 
programs and formats can be used to store the nucleotide sequence information of the present invention on computer 
45 readable medium. The sequence information can be represented in a word processing text file, formatted in commer- 
cially- available software such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, stored 
in a database application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of 
data-processor structuring formats {e.g., text file or database) in order to obtain computer readable medium having 
recorded thereon the nucleotide sequence information of the present invention. 
so Computer software is publicly available which allows a skilled artisan to access sequence information provided in 

a computer readable medium. Thus, by providing in computer readable form the nucleotide sequences of SEQ ID 
NOS: 1-5, 191 , a representative fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and 
most preferably at least 99.9% identical to a sequence of SEQ ID NOS:1-5,191 the present invention enables the 
skilled artisan routinely to access the provided sequence information for a wide variety of purposes. 
ss The examples which follow demonstrate how software which implements the BLAST (Altschul ef al, J. Mol. Biol. 

215:403410 (1990)) and BLAZE (Brutlag et al, Comp. Chem. 17:203-207 (1993)) search algorithms on a Sybase 
system was used to identify open reading frames (ORFs) within the Staphylococcus aureus genome which contain 
homology to ORFs or proteins from both Staphylococcus aureus and from other organisms. Among the ORFs discussed 
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herein ar protein encoding fragments of the Staphylococcus aureus genome useful in producing commercially impor- 
tant proteins, such as enzymes used in fermentation reactions and in th production of commercially useful metabolites. 

The pr sent invention f urth r provides systems, particularly computer-based systems, which contain the sequence 
information described herein. Such systems are designed to identify, among other things, comm rcially important frag- 

5 ments of the Staphylococcus aureus genome. 

As used herein, "a computer-based system 11 refers to the hardware means, software means, and data storage 
means used to analyze the nucleotide sequence information of the present invention. The minimum hardware means 
of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, 
output means, and data storage means. A skilled artisan can readily appreciate that any one of the currently available 

10 computer-based system are suitable for use in the present invention. 

As stated above, the computer-based systems of the present invention comprise a data storage means having 
stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means 
for supporting and implementing a search means. 

As used herein, "data storage means" refers to memory which can store nucleotide sequence information of the 

is present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide 
sequence information of the present invention. 

As used herein, "search means" refers to one or more programs which are implemented on the computer- based 
system to compare a target sequence or target structural motif with the sequence information stored within the data 
storage means. Search means are used to identify fragments or regions of the present genomic sequences which 

20 match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a variety 
of commercially available software for conducting search means are and can be used in the computer-based systems 
of the present invention. Examples of such software includes, but is not limited to, MacPattem (EMBL), BLASTN and 
BLASTX (NCBIA). A skilled artisan can readily recognize that any one of the available algorithms or implementing 
software packages for conducting homology searches can be adapted for use in the present computer-based systems. 

2S As used herein, a "target sequence" can be any DNA or amino acid sequence of six or more nucleotides or two 

or more amino acids. A skilled artisan can readily recognize that the longer a target sequence is, the less likely a target 
sequence will be present as a random occurrence in the database. The most preferred sequence length of a target 
sequence is from about 1 0 to 1 00 amino acids or from about 30 to 300 nucleotide residues. However, it is well recognized 
that searches for commercially important fragments, such as sequence fragments involved in gene expression and 

so protein processing, may be of shorter length. 

As used herein, "a target structural motif," or target motif," refers to any rationally selected sequence or combi- 
nation of sequences in which the sequence(s) are chosen based on a three-dimensional configuration which is formed 
upon the folding of the target motif. There are a variety of target motifs known in the art. Protein target motifs include, 
but are not limited to, enzymic active sites and signal sequences. Nucleic acid target motifs include, but are not limited 

35 to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences). 

A variety of structural formats for the input and output means can be used to input and output the information in 
the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the 
Staphylococcus aureus genomic sequences possessing varying degrees of homology to the target sequence or target 
motif. Such presentation provides a skilled artisan with a ranking of sequences which contain various amounts of the 

40 target sequence or target motif and identifies the degree of homology contained in the identified fragment. 

A variety of comparing means can be used to compare a target sequence or target motif with the data storage 
means to identify sequence fragments of the Staphylococcus aureus genome. In the present examples, implementing 
software which implement the BLAST and BLAZE algorithms, described in Altschul et at, J. Mot. Biol 215 : 403-410 
(1990), was used to identify open reading frames within the Staphylococcus aureus genome. A skilled artisan can 

45 readily recognize that any one of the publicly available homology search programs can be used as the search means 
for the computer-based systems of the present invention. Of course, suitable proprietary systems that may be known 
to those of skill also may be employed in this regard. 

Figure 1 provides a block diagram of a computer system illustrative of embodiments of this aspect of present 
invention. The computer system 102 includes a processor 106 connected to a bus 1 04. Also connected to the bus 104 

so are a main memory 1 08 (preferably implemented as random access memory, RAM) and a variety of secondary storage 
devices 110, such as a hard drive 112 and a removable medium storage device 114. The removable medium storage 
device 114 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable 
storage medium 116 (such as a floppy disk, a compact disk, a magnetic tape, etc.) containing control logic and/or data 
recorded therein may be inserted into the removable medium storage device 114. The computer system 102 includes 

ss appropriate software for reading the control logic and/or the data from the removable medium storage device 1 1 4, once 
it is inserted into the removable medium storage device 114. 

A nucleotide sequence of the present invention may be stored in a well known manner in the main memory 108, 
any of the secondary storage devices 110, and/or a removable storage medium 116. During execution, software for 
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accessing and processing the genomic sequence (such as search tools, comparing tools, etc.) reside in main memory 
108, in accordance with the requirements and operating parameters ot the operating system, the hardware system 
and the software program or programs. 

5 BIOCHEMICAL EMBODIMENTS 

Other embodiments ot the present invention are directed to fragments of the Staphylococcus aureus genome, 
preferably to isolated fragments. The fragments of the Staphylococcus aureus genome of the present invention include, 
but are not limited to fragments which encode peptides, hereinafter open reading frames (ORFs), fragments which 
10 modulate the expression of an operably linked ORF, hereinafter expression modulating fragments (EMFs) and frag- 
ments which can be used to diagnose the presence of Staphylococcus aureus in a sample, hereinafter diagnostic 
fragments (DFs). 

As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of the Staphylococcus aureus genome" 
refers to a nucleic acid molecule possessing a specific nucleotide sequence which has been subjected to purification 
is means to reduce, from the composition, the number of compounds which are normally associated with the composition. 
Particularly, the term refers to the nucleic acid molecules having the sequences set out in SEQ ID NOS:1-5,191, to 
representative fragments thereof as described above, to polynucleotides at least 95%, preferably at least 99% and 
especially preferably at least 99.9% identical in sequence thereto, also as set out above. 

A variety of purification means can be used to generated the isolated fragments of the present invention. These 
20 include, but are not limited to methods which separate constituents of a solution based on charge, solubility, or size. 

In one embodiment, Staphylococcus aureus DNA can be mechanically sheared to produce fragments of 1 5-20 kb 
in length. These fragments can then be used to generate an Staphylococcus aureus library by inserting them into 
lambda clones as described in the Examples below. Primers flanking, for example, an ORF, such as those enumerated 
in Tables 1-3 can then be generated using nucleotide sequence information provided in SEQ ID NOS: 1-5,191. Well 
2S known and routine techniques of PCR cloning then can be used to isolate the ORF from the lambda DNA library of 
Staphylococcus aureus genomic DNA. Thus, given the availability of SEQ ID NOS:1-5,191 , the information in Tables 
1, 2 and 3, and the information that may be obtained readily by analysis of the sequences of SEQ ID NOS:1-5,191 
using methods set out above, those of skill will be enabled by the present disclosure to isolate any ORF-containing or 
other nucleic acid fragment of the present invention. 
30 The isolated nucleic acid molecules of the present invention include, but are not limited to single stranded and 

double stranded DNA, and single stranded RNA. 

As used herein, an "open reading frame," ORF, means a series of triplets coding for amino acids without any 
termination ; codons and is a sequence translatable into protein. 

Tables" 1. 2 and 3 list ORFs in the Staphylococcus aureus genomic contigs of the present invention that were 
35 identified as- putative coding regions by the GeneMark software using organism-specific second-order Markov proba- 
bility transition matrices. It will be appreciated that other criteria can be used, in accordance with well known analytical 
methods, such as those discussed herein, to generate more inclusive, more restrictive or more selective lists. 

Table 1 sets out ORFs in the Staphylococcus aureus contigs of the present invention that are at least 80 amino 
acids long and over a continuous region of at least 50 bases which are 95% or more identical (by BLAST analysis) to 
40 an S. aureus nucleotide sequence available through Genbank in November 1996. 

Table 2 sets out ORFs in the Staphylococcus aureus contigs of the present invention that are not in Table 1 and 
match, with a BLASTP probability score of 0.01 or less, a polypeptide sequence available through Genbank by Sep- 
tember 1996. 

Table 3 sets out ORFs in the Staphylococcus aureus contigs of the present invention that do not match significantly 
45 by BLASTP analysis, a polypeptide sequence available through Genbank by September 1996. 

In each table, the first and second columns identify the ORF by respectively, contig number and ORF number 
within the contig; the third column indicates the reading frame, taking the first 5' nucleotide of the contig as the start of 
the +1 frame; the fourth column indicates the first nucleotide of the ORF, counting from the 5* end of the contig strand; 
and the fifth column indicates the length of each ORF in nucleotides. 
so (n Tables 1 and 2, column six, lists the Reference" for the closest matching sequence available through Genbank. 

These reference numbers are the databases entry numbers commonly used by those of skill in the art, who will be 
familiar with their denominators. Descriptions of the numenclature are available from the National Center for Biotech- 
nology Information. Column seven in Tables 1 and 2 provides the gene name" of the matching sequence; column eight 
provides the BLAST identity" score from the comparison of the ORF and the homologous gene; and column nine 
ss indicates the length in nucleotides of the highest scoring segment pair" identified by the BLAST identity analysis. 
In Table 3, the last column, column six, indicates the length of each ORF in amino acid residues. 
The concepts of percent identity and percent similarity of two polypeptide sequences is well understood in the art. 
For example, two polypeptides 10 amino acids in length which differ at three amino acid positions (e.g., at positions 



9 



EP 0 786 519 A2 



1 , 3 and 5) are said to have a percent identity of 70%. However, the same two polypeptides would be deemed to have 
a percent similarity of 80% if, for example at position 5, the amino acids moieties, although not identical, were "similar" 
(i.e., possessed similar biochemical characteristics). Many programs for analysis of nucleotide or amino acid sequence 
similarity, such as fasta and BLAST specifically list percent identity of a matching region as an output parameter Thus, 
5 for instance, Tables 1 and 2 herein enumerate the per cent identity" of the highest scoring s gment pair' in each ORF 
and its listed relative. Further details concerning the algorithms and criteria used for homology searches are provided 
below and are described in the pertinent literature highlighted by the citations provided below. 

It will be appreciated that other criteria can be used to generate more inclusive and more exclusive listings of the 
types set out in the tables. As those of skill will appreciate, narrow and broad searches both are useful. Thus, a skilled 
10 artisan can readily identify ORFs in contigs of the Staphylococcus aureus genome other than those listed in Tables 
1-3, such as ORFs which are overlapping or encoded by the opposite strand of an identified ORF in addition to those 
ascertainable using the computer-based systems of the present invention. 

As used herein,. an "expression modulating fragment," EMF, means a series of nucleotide molecules which mod- 
ulates the expression of an operably linked ORF or EMF 
15 As used herein, a sequence is said to "modulate the expression of an operably linked sequence" when the ex- 

pression of the sequence is altered by the presence of the EMF. EMFs include, but are not limited to, promoters, and 
promoter modulating sequences (inducible elements). One class of EMFs are fragments which induce the expression 
or an operably linked ORF in response to a specific regulatory factor or physiological event. 

EMF sequences can be identified within the contigs of the Staphylococcus aureus genome by their proximity to 
20 the ORFs provided in Tables 1-3. An intergenic segment, or a fragment of the intergenic segment, from about 10 to 
200 nucleotides in length, taken from any one of the ORFs of Tables 1-3 will modulate the expression of an operably 
linked ORF in a fashion similar to that found with the naturally linked ORF sequence. As used herein, an "intergenic 
segment" refers to fragments of the Staphylococcus aureus genome which are between two ORF(s) herein described. 
EMFs also can be identified using known EMFs as a target sequence or target motif in the computer-based systems 
2S of the present invention. Further, the two methods can be combined and used together 

The presence and activity of an EMF can be confirmed using an EMF trap vector. An EMF trap vector contains a 
cloning site linked to a marker sequence. A marker sequence encodes an identifiable phenotype, such as antibiotic 
resistance or a complementing nutrition auxotrophic factor, which can be identified or assayed when the EMF trap 
vector is placed within an appropriate host under appropriate conditions. As described above, a EMF will modulate the 
30 expression of an operably linked marker sequence. A more detailed discussion of various marker sequences is provided 
below. 

A sequence which is suspected as being an EMF is cloned in all three reading frames in one or more restriction 
sites upstream from the marker sequence in the EMF trap vector. The vector Js then transformed into an appropriate 
host using known procedures and the phenotype of the transformed host in examined under appropriate conditions. 
35 As described above, an EMF will modulate the expression of an operably linked marker sequence. 

As used herein, a "diagnostic fragment," DF, means a series of nucleotide molecules which selectively hybridize 
to Staphylococcus aureus sequences. DFs can be readily identified by identifying unique sequences within contigs of 
the Staphylococcus aureus genome, such as by using well-known computer analysis software, and by generating and 
testing probes or amplification primers consisting of the DF sequence in an appropriate diagnostic format which de- 
40 termines amplification or hybridization selectivity. 

The sequences falling within the scope of the present invention are not limited to the specific sequences herein 
described, but also include allelic and species variations thereof. Allelic and species variations can be routinely deter- 
mined by comparing the sequences provided in SEQ ID NOS:1 -5,1 91 , a representative fragment thereof , or a nucleotide 
sequence at least 95%, preferably 99% and most preferably 99.9% identical to SEQ ID NOS:1 -5,1 91 , with a sequence 
45 from another isolate of the same species. 

Furthermore, to accomodate codon variability, the invention includes nucleic acid molecules coding for the same 
amino acid sequences as do the nucleic acid sequences mentioned above. In other words, in the coding region of an 
ORF, substitution of one codon for another which encodes the same amino acid is expressly contemplated. 

Any specific sequence disclosed herein can be readily screened for errors by resequencing a particular fragment, 
50 such as an ORF, in both directions {i.e., sequence both strands). Alternatively, error screening can be performed by 
sequencing corresponding polynucleotides of Staphylococcus aureus origin isolated by using part or all of the fragments 
in question as a probe or primer. 

Each of the ORFs of the Staphylococcus aureus genome disclosed in Tables 1, 2 and 3, and the EMFs found 5' 
to the ORFs, can be used as polynucleotide reagents in numerous ways. For example, the sequences can be used 
55 as diagnostic probes or diagnostic amplification primers to detect the presence of a specific microbe in a sample, 
particular Staphylococcus aureus. Especially preferred in this regard are ORF such as those of Table 3, which do not 
match previously characterized sequences from other organisms and thus are most likely to be highly selective for 
Staphylococcus aureus. Also particularly preferred are ORFs that can be used to distinguish between strains of Sta- 
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phylococcus aureus, particularly those that distinguish medically important strain, such as drug-resistant strains. 

In addition, the fragments of the present invention, as broadly described, can be used to control g ne expression 
through triple helix formation or antisense DNA or RNA, both of which methods are based on the binding of a polynu- 
cleotide sequence to DNA or RNA. Triple helix- formation optimally results in a shut-off of RNA transcription from DNA, 

5 while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Information from the 
sequences of the present invention can be used to design antisense and triple helix-forming oligonucleotides. Polynu- 
cleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary 
to a region of the gene involved in transcription, for triple-helix formation, or to the mRNA itself, for antisense inhibition. 
Both techniques have been demonstrated to be effective in model systems, and the requisite techniques are well known 

10 and involve routine procedures. Triple helix techniques are discussed in, for example, Lee et al., Nucl. Acids Res. 6: 
3073 (1979); Cooney et at., Science 241 : 456 (1988); and Dervan era/., Science 251: 1360 (1991). Antisense tech- 
niques in general are discussed in, for instance, Okano, J. Neurochem. 56: 560 (1991) and OLIGODEOXYNUCLE- 
OTIDES AS ANTISENSE INHIBITORS OF GENE EXPRESSION, CRC Press, Boca Raton, FL (1988)). 

The present invention further provides recombinant constructs comprising one or more fragments of the Staphy- 

1$ lococcus aureus genomic fragments and contigs of the present invention. Certain preferred recombinant constructs of 
the present invention comprise a vector, such as a plasmid or viral vector, into which a fragment of the Staphylococcus 
aureus genome has been inserted, in a forward or reverse orientation. In the case of a vector comprising one of the 
ORFs of the present invention, the vector may further comprise regulatory sequences, including for example, a pro- 
moter, operably linked to the ORF For vectors comprising the EMFs of the present invention, the vector may further 

20 comprise a marker sequence or heterologous ORF operably linked to the EMF. 

Large numbers of suitable vectors and promoters are known to those of skill in the art and are commercially 
available for generating the recombinant constructs of the present invention. The following vectors are provided by 
way of example. Useful bacterial vectors include phagescript, PsiX174, pBluescript SK and KS (+ and -), pNH8a, 
pNH16a, pNH18a, pNH46a (available from Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRiT5 (available 

25 from Pharmacia). Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXT1 , pSG (available from Stratagene) 
pSVK3, pBPV, pMSG, pSVL (available from Pharmacia). 

Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other 
vectors with selectable markers. Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial pro- 
moters include lacl, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV 

30 thymidine^kiha'se, early and late SV40, LTRs from retrovirus, and mouse metallothionein- 1. Selection of the appropriate 
vector and promoter is well within the level of ordinary skill in the art. 

The present invention further provides host cells containing any one of the isolated fragments of the Staphylococcus 
aureus genomic fragments and contigs of the present invention, wherein the fragment has been introduced into the 
host cell using known methods. The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower 

35 eukaryoticlibst cell, such as a yeast cell, or a procaryotic cell, such as a bacterial cell. 

A polynucleotide of the present invention, such as a recombinant construct comprising an ORF of the present 
invention, may be introduced into the host by a variety of well established techniques that are standard in the art, such 
as calcium phosphate transfection, DEAE, dextran mediated transfection and electroporation, which are described in, 
for instance, Davis, L. et al., BASIC METHODS IN MOLECULAR BIOLOGY (1986). 

40 A host cell containing one of the fragments of the Staphylococcus aureus genomic fragments and contigs of the 

present invention, can be used in conventional manners to produce the gene product encoded by the isolated fragment 
(in the case of an ORF) or can be used to produce a heterologous protein under the control of the EMF. 

The present invention further provides isolated polypeptides encoded by the nucleic acid fragments of the present 
invention or by degenerate variants of the nucleic acid fragments of the present invention. By "degenerate variant" is 

45 intended nucleotide fragments which differ from a nucleic acid fragment of the present invention (e.g., an ORF) by 
nucleotide sequence but, due to the degeneracy of the Genetic Code, encode an identical polypeptide sequence. 

Preferred nucleic acid fragments of the present invention are the ORFs depicted in Tables 2 and 3 which encode 
proteins. 

A variety of methodologies known in the art can be utilized to obtain any one of the isolated polypeptides or proteins 
so of the present invention. At the simplest level, the amino acid sequence can be synthesized using commercially avail- 
able peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. 
Such short fragments as may be obtained most readily by synthesis are useful, for example, in generating antibodies 
against the native polypeptide, as discussed further below. 

In an alternative method, the polypeptide or protein is purified from bacterial cells which naturally produce the 
55 polypeptide or protein. One skilled in the art can readily employ well-known methods for isolating polpeptides and 
proteins to isolate and purify polypeptides or proteins of the present invention produced naturally by a bacterial strain, 
or by other methods. Methods for isolation and purification that can be employed in this regard include, but are not 
limited to, immunochromatography, HPLC, size-exclusion chromatography, ion-exchange chromatography, and immu- 



11 



EP0 786 519 A2 



no-affinity chromatography. 

Th polypeptides and proteins of the present invention also can be purified from cells which have be n altered to 
express the desired polypeptide or protein. As used herein, a cell is said to be altered to xpress a desir d polypeptide 
or prot in when th cell, through genetic manipulation, is made to produce a polypeptide or protein which it normally 
£ does not produce or which the cell normally produces at a lower level. Those skilled in th art can readily adapt pro- 
cedures for introducing and expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic cells 
in order to generate a cell which produces one of the polypeptides or proteins of the present invention. 

Any host/vector system can be used to express one or more of the ORFs of the present invention. These include, 
but are not limited to, eukaryotic hosts such as He La cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic 
10 host such as E. coli and B. subtilis. The most preferred cells are those which do not normally express the particular 
polypeptide or protein or which expresses the polypeptide or protein at low natural level. 

"Recombinant," as used herein, means that a polypeptide or protein is derived from recombinant (e.g., microbial 
or mammalian) expression systems. "Microbial" refers to recombinant polypeptides or proteins made in bacterial or 
fungal (e.g., yeast) expression systems. As a product, "recombinant microbial'defines a polypeptide or protein essen- 
1& tially free of native endogenous substances and unaccompanied by associated native gfycosylation. Polypeptides or 
proteins expressed in most bacterial cultures, e.g., E. coli, will be free of glycosylate modifications; polypeptides or 
proteins expressed in yeast will have a glycosylation pattern different from that expressed in mammalian cells. 

"Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides. Generally, DNA segments encoding the 
polypeptides and proteins provided by this invention are assembled from fragments of the Staphylococcus aureus 
20 genome and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is 
capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a mi- 
crobial or viral operon. 

"Recombinant expression vehicle or vector" refers to a plasmid or phage or virus or vector, for expressing a polypep- 
tide from a DNA (RNA) sequence. The expression vehicle can comprise a transcriptional unit comprising an assembly 

25 of (1) a genetic regulatory elements necessary for gene expression in the host, including elements required to initiate 
and maintain transcription at a level sufficient for suitable expression of the desired polypeptide, including, for example, 
promoters and, where necessary, an enhancers and a potyadenylation signal; (2) .a structural or coding sequence 
which is transcribed into mRNA and translated into protein, and (3) appropriate signals to initiate translation at the 
beginning of the desired coding region and terminate translation at its end. Structural units intended for use in yeast 

30 or eukaryotic expression systems preferably include a leader sequence-enabling extracellular secretion of translated 
protein by a host cell. Alternatively, where recombinant protein is expressed without a leader or transport sequence, 
it may include an N-terminal methionine residue. This residue may or may not be subsequently cleaved from the 
expressed recombinant protein to provide a final product. 

"Recombinant expression system" means host cells which have stably integrated a recombinant transcriptional 

3S unit into chromosomal DNA or carry the recombinant transcriptional unit extra.chromosomalty. The cells can be prokary- 
otic or eukaryotic.. Recombinant expression systems as defined herein will express heterologous polypeptides or pro- 
teins upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed. 

Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appro- 
priate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived 

40 from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic 
and eukaryotic hosts are described in Sambrook era/., MOLECULAR CLONING: A LABORATORY MANUAL, 2 nd Edi- 
tion, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of which is hereby 
incorporated by reference in its entirety. 

Generally, recombinant expression vectors will include origins of replication and selectable markers permitting 

4S transformation of the host cell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a 
promoter derived from a highly expressed gene to direct transcription of a downstream structural sequence. Such 
promoters can be derived from operons encoding glycolytic enzymes such as 3-phosphogrycerate kinase (PGK), alpha- 
factor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled 
in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable 

so of directing secretion of translated protein into the periplasmic space or extracellular medium. Optionally, the heterol- 
ogous sequence can encode a fusion protein including an N-terminal identification peptide imparting desired charac- 
teristics, e.g., stabilization or simplified purification of expressed recombinant product. 

Useful expression vectors for bacterial use are constructed by inserting a structural DNA sequence encoding a 
desired protein together with suitable translation initiation and termination signals in operable reading phase with a 

55 functional promoter. The vector will comprise one or more phenotypic selectable markers and an origin of replication 
to ensure maintenance of the vector and, when desirable, provide amplification within the host. 

Suitable prokaryotic hosts for transformation include strains of Staphylococcus aureus, E. coli, B. subtilis, Salmo- 
nella typhimurium and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus. Others 



12 



EP0 786 519 A2 



may, also be employed as a matter of choice. 

As a representative but non-limiting example, useful expression vectors for bacterial use can comprise a selectable 
marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements 
of the well known cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 
5 (available form Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (available from Promega Biotec, Madison, 
Wl, USA). These pBR322 "backbone" sections are combined with an appropriate promoter and the structural sequence 
to be expressed. 

Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the 
selected promoter, where it is inducible, is derepressed or induced by appropriate means (e.g., temperature shift or 
10 chemical induction) and cells are cultured for an additional period to provide for expression of the induced gene product. 
Thereafter cells are typically harvested, generally by centrifugation, disrupted to release expressed protein, generally 
by physical or chemical means, and the resulting crude extract is retained for further purification. 

Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mam- 
malian expression systems include the COS-7 lines of monkey kidney fibroblasts, described in Gluzman, Ce//23: 175 
15 (1981), and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and 
BHK cell lines. 

Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also 
any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination 
sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for ex- 

20 ample, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required 
nontranscribed genetic elements. 

Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by initial extraction from 
cell pellets, followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography steps. Mi- 
crobial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw 

25 cycling, sonication, mechanical disruption, or use of cell lysing agents. Protein refolding steps can be used, as neces- 
sary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can 
be employed for final purification steps. 

An additional aspect of the invention includes Staphylococcus aureus polypeptides which are useful as immuno- 
diagnostic antigens and/or immunoprotective vaccines, collectively •immunologically useful polypeptides". Such im- 

30 munologically* useful polypeptides may be selected from the ORFs disclosed herein based on techniques well known 
in the art and described elsewhere herein. The inventors have used the following criteria to select several immunolog- 
ically useful polypeptides: 

As is known in the art, an amino terminal type I signal sequence directs a nascent protein across the plasma and 
outer membranes to the exterior of the bacterial cell. Such outermembrane polypeptides are expected to be immuno- 

35 logically useful. According to Izard, J. W. et al., Mol. Microbiol. 13, 765-773; (1994), polypeptides containing type I 
signal sequences contain the following physical attributes: The length of the type I signal sequence is approximately 
15 to 25 primarily hydrophobic amino acid residues with a net positive charge in the extreme amino terminus; the 
central region of the signal sequence must adopt an alpha-helical conformation in a hydrophobic environment; and the 
region surrounding the actual site of cleavage is ideally six residues long, with small side-chain amino acids in the -1 

40 and -3 positions. 

Also known in the art is the type IV signal sequence which is an example of the several types of functional signal 
sequences which exist in addition to the type I signal sequence detailed above. Although functionally related, the type 
IV signal sequence possesses a unique set of biochemical and physical attributes (Strom, M. S. and Lory, S., J. Bac- 
terid. 174, 7345-7351; 1992)). These are typically six to eight amino acids with a net basic charge followed by an 

45 additional sixteen to thirty primarily hydrophobic residues. The cleavage site of a type IV signal sequence is typically 
after the initial six to eight amino acids at the extreme amino terminus. In addition, all type IV signal sequences contain 
a phenylalanine residue at the +1 site relative to the cleavage site. 

Studies of the cleavage sites of twenty-six bacterial lipoprotein precursors has allowed the definition of a consensus 
amino acid sequence for lipoprotein cleavage. Nearly three-fourths of the bacterial lipoprotein precursors examined 

so contained the sequence L-(A,S)-(G,A)-C at positions -3 to +1, relative to the point of cleavage (Hayashi, S. and Wu, 
H. C. Lipoproteins in bacteria. J Bioenerg. Biomembr. 22, 451-471; 1990). 

It well known that most anchored proteins found on the surface of gram-positive bacteria possess a highly con- 
served carboxy terminal sequence. More than fifty such proteins from organisms such as S. pyogenes, S. mutans, E. 
faecai'ts, S. pneumoniae, and others, have been identified based on their extracellular location and carboxy terminal 

ss amino acid sequence (Fischetti, V. A. Gram-positive commensal bacteria deliver antigens to elicit mucosal and systemic 
immunity. ASM News 62, 40541 0; 1 996). The conserved region is comprised of six charged amino acids at the extreme 
carboxy terminus coupled to 15-20 hydrophobic amino acids presumed to function as a transmembrane domain. Im- 
mediately adjacent to the transmembrane domain is a six amino acid sequence conserved in nearly all proteins ex- 
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am in d. The amino acid sequence of this region is L-P-X-T-G-X, wher X is any amino acid. 

Amino acid sequ nee similarities to proteins of kn wn function by BLAST enables the assignment of putative 
functions to novel amino acid sequences and allows for the selection of proteins thought to function outside the cell 
wall. Such proteins ar well known in the art and include "lipoprotein 1 ', "periplasmic", or ■antigen 8 . 
5 An algorithm for s lecting antigenic and immunogenic Staphylococcus aureus polypeptides including the foregoing 

criteria was developed by the present inventors. Use of the algorithm by the inventors to select immunologically useful 
Staphylococcus aureus polypeptides resulted in the selection of several ORFs which are predicted to be outermem- 
brane-associated proteins. These proteins are identified in Table 4, below, and shown in the Sequence Listing as SEQ 
ID NOS:5, 1 92 to 5,255. Thus the amino acid sequence of each of several anX\ger\\cStaphylococcus aureus polypeptides 
10 listed in Table 4 can be determined, for example, by locating the amino acid sequence of the ORF in the Sequence 
Listing. Likewise the polynucleotide sequence encoding each ORF can be found by locating the corresponding poly- 
nucleotide SEQ ID in Tables 1, 2, or 3, and finding the corresponding nucleotide sequence in the sequence listing. 

As will be appreciated by those of ordinary skill in the art, although a polypeptide representing an entire ORF may 
be the closest approximation to a protein found in vivo, it is not always technically practical to express a complete ORF 
is in vitro. It may be very challenging to express and purify a highly hydrophobic protein by common laboratory methods. 
As a result, the immunologically useful polypeptides described herein as SEQ ID NOS:5, 192-5,255 may have been 
modified slightly to simplify the production of recombinant protein, and are the preferred embodiments. In general, 
nucleotide sequences which encode highly hydrophobic domains, such as those found at the amino terminal signal 
sequence, are excluded for enhanced in vitro expression of the polypeptides. Furthermore, any highly hydrophobic 
20 amino acid sequences occurring at the carboxy terminus are also excluded. Such truncated polypeptides include for 
xample the mature forms of the polypeptides expected to exist in nature. 

Those of ordinary skill in the art can identify soluble portions the polypeptide identified in Table 4, and in the case 
of truncated polypeptides sequences shown as SEQ ID NOS:5, 192-5,255, may obtain the complete predicted amino 
acid sequence of each polypeptide by translating the corresponding polynucleotides sequences of the corresponding 
2S ORF listed in Tables 1 ,2 and 3 and found in the sequence listing. 

Accordingly, polypeptides comprising the complete amino acid of an immunologically useful polypeptide selected 
from the group of polypeptides encoded by the ORFs identified in Table 4, or an amino acid sequence at least 95% 
identical thereto, preferably at least 97% identical thereto, and most preferably at least 99% identical thereto form an 
embodiment of the invention; in addition polypeptides comprising an amino acid sequence selected from the group of 
30 amino acid sequences shown in the sequence listing as SEQ ID NOSc5,1 91 -5,255, or an amino acid sequence at least 
95% identical thereto, preferably at least 97% identical thereto and most preferably at least 99% identical thereto, form 
an embodiment of the invention. Polynucleotides encoding the foregoing polypeptides also form part of the present 
invention. 

In another aspect, the invention provides a peptide or polypeptide comprising an epitope-bearing portion of a 

35 polypeptide of the invention, particularly those epitope-bearing portions (antigenic regions) identified in Table 4. The 
epitope-bearing portion is an immunogenic or antigenic epitope of a polypeptide of the invention. An "immunogenic 
epitope". is defined as a part of a protein that elicits an antibody response when the whole protein is the immunogen. 
On the other hand, a region of a protein molecule to which an antibody can bind is defined as an "antigenic epitope." 
The number of immunogenic epitopes of a protein generally is less than the number of antigenic epitopes. See, for 

40 instance, Geysen et aL, Proc. Natl. Acad. Sci. USA 81 :3998- 4002 (1983). 

As to the selection of peptides or polypeptides bearing an antigenic epitope (i.e., that contain a region of a protein 
molecule to which an antibody can bind), it is well known in that art that relatively short synthetic peptides that mimic 
part of a protein sequence are routinely capable of eliciting an antiserum that reacts with the partially mimicked protein. 
See, for instance, Sutcliffe, J. G., Shinnick, T M., Green, N. and Learner, R. A. (1983) "Antibodies that react with 

45 predetermined sites on proteins", Science, 219:660-666. Peptides capable of eliciting protein-reactive sera are fre- 
quently represented in the primary sequence of a protein, can be characterized by a set of simple chemical rules, and 
are confined neither to immunodominant regions of intact proteins (i.e., immunogenic epitopes) nor to the amino or 
carboxyl terminals. Antigenic epitope-bearing peptides and polypeptides of the invention are therefore useful to raise 
antibodies, including monoclonal antibodies, that bind specifically to a polypeptide of the invention. See, for instance, 

so Wilson et al., Cell 37:767-778 (1984) at 777. 

Antigenic epitope-bearing peptides and polypeptides of the invention preferably contain a sequence of at least 
seven, more preferably at least nine and most preferably between about 15 to about 30 amino acids contained within 
the amino acid sequence of a polypeptide of the invention. Non -limiting examples of antigenic polypeptides or peptides 
that can be used to generate S. aureus specific antibodies include: a polypeptide comprising peptides shown in Table 

55 4 below. These polypeptide fragments have been determined to bear antigenic epitopes of indicated S. aureus proteins 
by the analysis of the Jameson-Wolf antigenic index, a representative sample of which is shown in Figure 3. 

The epitope-bearing peptides and polypeptides of the invention may be produced by any conventional means. 
See, e.g., Houghten, R. A. (1985) General method for the rapid solid-phase synthesis of large numbers of peptides: 
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specificity of antigen-antibody interaction at the I vel of individual amino acids. Proc. Natl. Acad. Sci. USA 82: 
5131-5135; this "Simultaneous Multiple Peptide Synthesis (SMPS)" process is further describ d in U.S. Patent No. 
4,631,211 to Houghten et al. (1986). Epitope-bearing peptides and polypeptides of the invention are used to induce 
antibodies according to methods well known in the art. See, for instance, Sutclifle et al., supra; Wilson t al., supra; 

s Chow, M. et al., Proc. Natl. Acad. Sci. USA 82:910-914; and Bittle, F. J. et al., J. Gen. Virol. 66:2347-2354 (1985), 

Immunogenic epitope-bearing peptides of the invention, i.e. , those parts of a protein that elicit an antibody response 
when the whole protein is the immunogen, are identified according to methods known in the art. See, for instance, 
Geysen etaL, supra. Further still, U.S. Patent No. 5, 194, 392 to Geysen (1990) describes a general method of detecting 
or determining the sequence of monomers (amino acids or other compounds) which Is a topological equivalent of the 

10 epitope (i.e., a "mimotope") which is complementary to a particular paratope (antigen binding site) of an antibody of 
interest. More generally, U.S. Patent No. 4,433,092 to Geysen (1989) describes a method of detecting or determining 
a sequence of monomers which is a topographical equivalent of a ligand which is complementary to the ligand binding 
site of a particular receptor of interest. Similarly, U.S. Patent No. 5,480,971 to Houghten, R. A. et al. (1996) on Per- 
alkylated Oligopeptide Mixtures discloses linear C1 -C7-alkyl peralkylated oligopeptides and sets and libraries of such 

1$ peptides, as well as methods for using such oligopeptide sets and libraries for determining the sequence of a per- 
alkylated oligopeptide that preferentially binds to an acceptor molecule of interest. Thus, non-peptide analogs of the 
epitope-bearing peptides of the invention also can be made routinely by these methods. 

Table 4 lists immunologically useful polypeptides identified by an algorithm which locates novel Staphylococcus 
aureus outermembrane proteins, as is described above. Also listed are epitopes or "antigenic regions" of each of the 

20 identified polypeptides. The antigenic regions, or epitopes, are delineated by two numbers x-y, where x is the number 
of the first amino acid in the open reading frame included within the epitope and y is the number of the last amino acid 
in the open reading frame included within the epitope. For example, the first epitope in ORF 168-6 is comprised of 
amino acids 36 to 45 of SEQ ID NO:5,192, as is described in Table 4. The inventors have identified several epitopes 
for each of the antigenic polypeptides identified in Table 4. Accordingly, forming part of the present invention are 

25 polypeptides comprising an amino acid sequence of one or more antigenic regions identified in Table 4. The invention 
further provides polynucleotides encoding such polypeptides. 

The present invention further includes isolated polypeptides, proteins and nucleic acid molecules which are sub- 
stantially; equivalent to those herein described. As us6d herein, substantially equivalent can refer both to nucleic acid 
and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or more 

so substitutions;>deletions, or additions, the net effect of which does not result in an adverse functional dissimilarity be- 
tween reference and subject sequences. For purposes of the present invention, sequences having equivalent biological 
activity, and equivalent expression characteristics are considered substantially equivalent. For purposes of determining 
equivalence, truncation of the mature sequence should be disregarded. 

The invention further provides methods of obtaining homoiogs from other strains of Staphylococcus aureus, of the 

35 fragments of the Staphylococcus aureus genome of the present invention and homoiogs of the proteins encoded by 
the ORFs of the present invention. As used herein, a sequence or protein of Staphylococcus aureus is defined as a 
homolog of a fragment of the Staphylococcus aureus fragments or contigs or a protein encoded by one of the ORFs 
of the present invention, if it shares significant homology to one of the fragments of the Staphylococcus aureus genome 
of the present invention or a protein encoded by one of the ORFs of the present invention. Specifically, by using the 

40 sequence disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybrid- 
ization, one skilled in the art can obtain homoiogs. 

As used herein, two nucleic acid molecules or proteins are said to "share significant homology" if the two contain 
regions which prossess greater than 85% sequence (amino acid or nucleic acid) homology. Preferred homoiogs in this 
regard are those with more than 90% homology. Especially preferred are those with 93% or more homology. Among 

45 especially preferred homoiogs those with 95% or more homology are particularly preferred. Very particularly preferred 
among these are those with 97% and even more particularly preferred among those are homoiogs with 99% or more 
homology. The most preferred homoiogs among these are those with 99.9% homology or more. It will be understood 
that, among measures of homology, identity is particularly preferred in this regard. 

Region specific primers or probes derived from the nucleotide sequence provided in SEQ ID NOS:1 -5,191 or from 

so a nucleotide sequence at least 95%, particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ 
ID NOS:1-5,191 can be used to prime DN A synthesis and PCR amplification, as well as to identify colonies containing 
cloned DNA encoding a homolog. Methods suitable to this aspect of the present invention are well known and have 
been described in great detail in many publications such as, for example, Innis et al., PCR PROTOCOLS, Academic 
Press, San Diego, CA (1990)). 

55 When using primers derived from SEQ ID NOS:1-5,191 or from a nucleotide sequence having an aforementioned 

identity to a sequence of SEQ ID NOS:1-5,191 , one skilled in the art will recognize that by employing high stringency 
conditions {e.g., annealing at 50-60°C in 6X SSPC and 50% formamide, and washing at 50- 65°C in 0.5X SSPC) only 
sequences which are greater than 75% homologous to the primer will be amplified. By employing lower stringency 
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condrt.ons {e.g hybntong at 35-37'C in 5X SSPC and 40-45% formamide, and washing at 42'C in 0 5X SSPC) 
sequences wh.ch are greater than 40-50% homologous to the primer will also be amplified h 
When using DNAprobesdewedfromSEQIDNOS:1-5,191, orfromanucl otid sequence havino an afor™ 
t.onechd ntitytoasequ nee of SEQ ID NOS: 1-5, 191, for colony/plaque hybridization ^bSL^^S^" 

ex D r^L™h m ^ US6Cl 35 S ° UfCe '° r h0m0lO9S ° f the P resent hVBntk « so ^ the organism naturally 



are bacterias which are closely related to Staphylococcus aureus 
ILLUSTRATIVE USES OF COMPOSITIONS OF THE INVENTION 



Each ORF provided in Tables 1 and 2 is identified with a function by homology to a known oene or mMmmmo 

t^T SkiMed in °" USS * he P 0 *"*"" of the present invenlZlrcZZrl^Z^T^ 

industrial purposes consistent with the type of putative identification of the polypeptide Such WsnWtostaSSnS S 
*o S'SSET 10 USS 5* f*^*^««»ORF. in a manner simila'r tie Un^^SS^^ 
20 the identification is made; for example, to ferment a particular sugar source or to produce a oarticutar ^IZT a 
vanetyo, reviews i..us,ra,K,eof this aspect of the inventionareavaiS 

use of enzymes, for example. BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY HANDBOOKS ^ f 2? 
mi.lan Publicans. Ltd. NY (1991) and BIOCATALYSTS IN ORGANIC SYNTHESES^ Tramped Ids Eisner" 

*s i™^ s LT r S,erdam ' ^ Netherbnds (1985). A variety o, exemplary uses Cm^e'^sZZ 
es aspects of the present invention are discussed below. mis ana similar 



1. Blosynthetic Enzymes 



nJSHZ rea f ,n9 ?T e r encodin9 P fo,eins involved in mediating the catalytic reactions involved in intermediary and 
macromolecutar metaboHsm, the biosynthesis of small molecules.xellular processes and other^nct^duZ en 

metabolism, enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in ierwenSton lwnZ 
■nvolved m ATP proton motor force conversion, enzymes involved in. broad ^gulato.y fuS on ^Ses invZdln 

m*.^ "J nterest are polypeptides involved in the degradation of intermediary metabolites as well as non 
macromolecular metabolism. Such enzymes include amylases, glucose oxidases, and catalase 

Proteolytic enzymes are another class of commercially important enzymes. Proteolytic enzymes find use in * 
numberofindust^ 

S^KTfS IT iUiCeS ' in eX,raC,i0n °' ve 9 etab,es " oil and in the maceration of frutts and vegeSs to 
g*e unicellular frurts. A detailed review of the proteolytic enzymes used in the food industry is provided h RoSEuE 

e 1? tri- 79 (1 ^ 6> ^ V ° ra9en BtaL in BIOC ATALYSTS IN AGRICULTURAL SS^SSSS^£. 
er et a/., Eds.. American Chemical Society Symposium Series 389 93 (1 989) ' " rtak 

inv^ Z^T™ h ? U9ar ? iS a " imp0rtant aspect ° f ,he prima, y metabolism of Stepftytococcus aureus Enzymes 
involved ,n the degradation of sugars, such as, particularly, glucose, galactose, fructose and xylose can be used^n 
.ndustnal fermentation. Some of the important sugar transforming enzymes, f om a corlneS viewpoint include 

which e o^ Ch iSOmeraSe - ° ,her ™ ,abol « have found commerces SI S 

aSd uslnc Tthf R S t • 9 H n, ° aC ' d (KGA) KGA iS 30 in,ef ™ dia «° « commercial production of ascoTbic 
SelrwJlim 6 ^^;^" 6, " deSCrib6d KrUe " r eta '- RHins efa,., Eds., VeHag 

fnrm^ S T idaSe (G ° D> * commercia,| y avai,able and has been used in purified form as well as in an immobilized 
orm for he deoxygenation of beer. See. for instance. Hartmeir et a,., Biotechnology Letters V 21 (1979^ most 

TtX Ca,IOn .° ? indUS,ria ' 60316 ,err ™°" « 9 ,uco "- add Market for gluooni 25. wnich are 
used ,n the detergent, textile, leather, photographic, pharmaceutical, food, feed and concrete industry as defied 
for example, m Bigel.s et al, beginning on page 357 in GENE MANIPULATIONS AND FUNGI Bene^a^s 
Academic Press, New York (1985). In addition to industrial applications. GOD has found aSS^i 
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quantitativ determination of glucose in body fluids recently in biotechnology for analyzing syrups from starch and 
cellulose hydrosylates. This application is described in Owusu et al, Biochem. et Biophysica. Acta. 872: 83 (1986), for 
instance. 

The main sweetener used in the world today is sugar which comes from sugar beets and sugar cane. In the field 

5 of industrial enzymes, the glucose isomerase process shows the largest expansion in the market today. Initially, soluble 
enzymes were used and later immobilized enzymes were developed (Krueger et al. t Biotechnology, The Textbook of 
Industrial Microbiology, Sinauer Associated Incorporated, Sunderland, Massachusetts (1990)). Today, the use of glu- 
cose- produced high fructose syrups is by far the largest industrial business using immobilized enzymes. A review of 
the industrial use of these enzymes is provided by Jorgensen, Starch 40:307 (1988). 

10 Proteinases, such as alkaline serine proteinases, are used as detergent additives and thus represent one of the 

largest volumes of microbial enzymes used in the industrial sector. Because of their industrial importance, there is a 
large body of published and unpublished information regarding the use of these enzymes in industrial processes. (See 
Faultman et al, Acid Proteases Structure Function and Biology, Tang, J., ed., Plenum Press, New York (1977) and 
Godfrey et al, Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et al, Report Industrial En- 

is zymes by 1990, Hel Hepner & Associates, London (1986)). 

Another class of commercially usable proteins of the present invention are the microbial lipases, described by, for 
instance, Macrae et al, Philosophical Transactions of the Chiral Society of London 31 0:227 (1 985) and Poserke, Jour- 
nal of the American Oil Chemist Society 61:1758 (1984). A major use of lipases is in the fat and oil industry for the 
production of neutral glycerides using lipase catalyzed inter-esterification of readily available triglycerides. Application 

20 of lipases include the use as a detergent additive to facilitate the removal of fats from fabrics in the course of the 
washing procedures. 

The use of enzymes, and in particular microbial enzymes, as catalyst for key steps in the synthesis of complex 
organic molecules is gaining popularity at a great rate. One area of great interest is the preparation of chiral interme- 
diates. Preparation of chiral intermediates is of interest to a wide range of synthetic chemists particularly those scientists 

25 involved with the preparation of new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et al, Re- 
cent Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, Boca Raton, Florida (1990)). 
The following reactions catalyzed by enzymes are of interest to organic chemists:hydrolysis of carboxylic acid esters, 
phosphate esters, amides and nitriles, esterification reactions, trans -esterification reactions, synthesis of amides, re- 
duction of alkanones and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to sulfoxides, 

30 and carbdn f bond forming reactions such as the aldol reaction. 

When considering the use of an enzyme encoded by one of the ORFs of the present invention for biotransformation 
and organic synthesis it is sometimes necessary to consider the respective advantages and disadvantages of using a 
microorganism as opposed to an isolated enzyme. Pros and cons of using a whole cell system on the one hand or an 
isolated partially purified enzyme on the other hand, has been described in detail by Bud et al, Chemistry in Britain 

35 (1987), p. 127. 

Amino transferases, enzymes involved in the biosynthesis and metabolism of amino acids, are useful in the catalytic 
production of amino acids. The advantages of using microbial based enzyme systems is that the amino transferase 
enzymes catalyze the stereo- selective synthesis of only L -amino acids and generally possess uniformly high catalytic 
rates. A description of the use of amino transferases for amino acid production is provided by Rose lie-David, Methods 
40 of Enzvmoloav 1 36:479 (1 987). 

Another category of useful proteins encoded by the ORFs of the present invention include enzymes involved in 
nucleic acid synthesis, repair, and recombination. A variety of commercially important enzymes have previously been 
isolated from members of Staphylococcus aureus. These include Sau3A and Sau96l. 

45 2. Generation of Antibodies 

As described here, the proteins of the present invention, as well as homologs thereof, can be used in a variety 
procedures and methods known in the art which are currently applied to other proteins. The proteins of the present 
invention can further be used to generate an antibody which selectively binds the protein. Such antibodies can be 
so either monoclonal or polyclonal antibodies, as well fragments of these antibodies, and humanized forms. 

The invention further provides antibodies which selectively bind to one of the proteins of the present invention and 
hybridomas which produce these antibodies. A hybridoma is an immortalized cell line which is capable of secreting a 
specific monoclonal antibody. 

In general, techniques for preparing polyclonal and monoclonal antibodies as well as hybridomas capable of pro- 
55 ducing the desired antibody are well known in the art (Campbell, A. M., MONOCLONAL ANTIBODY TECHNOLOGY: 
LABORATORY TECHNIQUES IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, Elsevier Science Publishers, Am- 
sterdam, The Netherlands (1984); St. Groth etal, J. Immunol Methods 35: 1-21 (1980), Kohlerand Milstein, Nature 
256 : 495-497 (1 975)), the trioma technique, the human B- cell hybridoma technique (Kozbor et al, Immunology Today 
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4: 72 (1983), pgs. 77-96 of Col et ai, in MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R. Liss, Inc. 
0985)). 

Any animal (mouse, rabbit, etc.) which is known to produce antibodies can be immunized with the pseudogen 
potypeptid . Methods for immunization ar w II known in the art. Such methods include subcutaneous or interperitoneal 
s injection of the polypeptide. One skilled in the art will recognize that the amount of the protein encoded by the ORF of 
the present invention used for immunization will vary based on the animal which is immunized, the antigenicity of the 
peptide and the site of injection. 

The protein which is used as an immunogen may be modified or administered in an adjuvant in order to increase 
the protein's antigenicity. Methods of increasing the antigenicity of a protein are well known in the art and include, but 
10 are not limited to coupling the antigen with a heterologous protein (such as globulin or galactosidase) or through the 
inclusion of an adjuvant during immunization. 

For monoclonal antibodies, spleen cells from the immunized animals are removed, fused with myeloma cells, such 
as SP2/0-Ag14 myeloma cells, and allowed to become monoclonal antibody producing hybridoma cells. 

Any one of a number of methods well known in the art can be used to identify the hybridoma cell which produces 
is an antibody with the desired characteristics. These include screening the hybridomas with an ELISA assay, western 
blot analysis, or radioimmunoassay (Lutz et ai, Exp. Cell Res. 175 : 109-124 (1988)). 

Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using procedures 
known in the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Mo- 
lecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1 984)). 
20 Techniques described for the production of single chain antibodies (U. S. Patent 4,946,778) can be adapted to 

produce single chain antibodies to proteins of the present invention. 

For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is screened for 
the presence of antibodies with the desired specificity using one of the above-described procedures. 

The present invention further provides the above- described antibodies in detectably labelled form. Antibodies can 
25 be detectably labelled through the use of radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels 
(such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), 
paramagnetic atoms, etc. Procedures for accomplishing such labelling are well-known in the art, for example see 
Stemberger etal, J. Histochem. Cytochem. 18:315 (1970); Bayer, E. A. etal., Meth. Enzym. 62:308 (1979); Engval, 
E. etal., Immunol. 109:129 (1972); Goding, J. W. J. Immunol. Meth. 13:215 (1976)). 
30 The labeled antibodies of the present invention can be used for in vitro, in vivo, and in situ assays to identify celts 

or tissues in which a fragment of the Staphylococcus aureus genome is expressed. 

The present invention further provides the above-described antibodies immobilized on a solid support. Examples 
of such solid supports include plastics such as polycarbonate, complex carbohydrates such as agarose and sepharose, 
acrylic resins and such as polyacrylamide and latex beads. Techniques for coupling antibodies to such solid supports 
35 are well known in the art (Weir, D. M. et at., "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific 
Publications, Oxford, England, Chapter 10 (1986); Jacoby, W. D. etal, Meth. Enzym. 34 Academic Press, N. Y (1974)). 
The immobilized antibodies of the present invention can be used for in vitro, in vivo, and in situ assays as well as for 
immunoaffinity purification of the proteins of the present invention. 

40 3. Diagnostic Assays and Kits 

The present invention further provides methods to identify the expression of one of the ORFs of the present in- 
vention, or homolog thereof, in a test sample, using one of the DFs.antigens or antibodies of the present invention. 
In detail, such methods comprise incubating a test sample with one or more of the antibodies, or one or more of 

45 the DFs, or one or more antigens of the present invention and assaying for binding of the DFs, antigens or antibodies 
to components within the test sample. 

Conditions for incubating a DF, antigen or antibody with a test sample vary. Incubation conditions depend on the 
format employed in the assay, the detection methods employed, and the type and nature of the DF or antibody used 
in the assay. One skilled in the art will recognize that any one of the commonly available hybridization, amplification 

bo or immunological assay formats can readily be adapted to employ the Dfs, antigens or antibodies of the present in- 
vention. Examples of such assays can be found in Chard, T., An Introduction to Radioimmunoassay and Related 
Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); Bullock, G. R. et at., Techniques in 
Immunocytochemistry, Academic Press, Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, P., Practice 
and Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry; PCT publication W095/32291, and 

SS Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985), all of which are hereby incorpo- 
rated herein by reference. 

The test samples of the present invention include cells, protein or membrane extracts of cells, or biological fluids 
such as sputum, blood, serum, plasma, or urine. The test sample used in the above-described method will vary based 
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on the assay format, nature of the detection method and the tissues, cells or extracts used as the sample to be assayed. 
Methods for preparing protein extracts or membrane extracts of cells are well known in the art and can be r adify be 
adapted in order to obtain a sample which is compatible with the system utilized. 

In another embodiment of the present invention, kits are provided which contain th necessary r agents to carry 
out the assays of the present inventbn. 

Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers 
which comprises: (a) a first container comprising one of the Dfs, antigens or antibodies of the present invention; and 
(b) one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting 
presence of a bound DF, antigen or antibody. 

In detail, a compartmentalized kit includes any kit in which reagents are contained in separate containers. Such 
containers include small glass containers, plastic containers or strips of plastic or paper. Such containers allows one 
to efficiently transfer reagents from one compartment to another compartment such that the samples and reagents are 
not cross-contaminated, and the agents or solutions of each container can be added in a quantitative fashion from one 
compartment to another. Such containers will include a container which will accept the test sample, a container which 
contains the antibodies used in the assay, containers which contain wash reagents (such as phosphate buffered saline, 
Tris-buffers, etc.), and containers which contain the reagents used to detect the bound antibody, antigen or DR 

Types of detection reagents include labelled nucleic acid probes, labelled secondary antibodies, or in the alterna- 
tive, if the primary antibody is labelled, the enzymatic, or antibody binding reagents which are capable of reacting with 
the labelled antibody. One skilled in the art will readily recognize that the disclosed Dfs, antigens and antibodies of the 
present invention can be readily incorporated into one of the established kit formats which are well known in the art. 

4. Screening Assay for Binding Agents 

Using the isolated proteins of the present invention, the present invention further provides methods of obtaining 
and identifying agents which bind to a protein encoded by one of the ORFs of the present invention or to one of the 
fragments and the Staphylococcus aureus fragment and contigs herein described. 

In general, such methods comprise steps of: 

(a) contacting an agent with an isolated protein encoded by one of the ORFs of the present invention, or an isolated 
fragment: of the Staphylococcus aureus genome; and 

(b) determining whether the agent binds to said protein or said fragment. 

The accents screened in the above assay can be, but are not limited to, peptides, carbohydrates, vitamin derivatives, 
or other pharmaceutical agents. The agents can be selected and screened at random or rationally selected or designed 
using protein modeling techniques. 

For random screening, agents such as peptides, carbohydrates, pharmaceutical agents and the like are selected 
at random and are assayed for their ability to bind to the protein encoded by the ORF of the present invention. 

Alternatively, agents may be rationally selected or designed. As used herein, an agent is said to be "rationally 
selected or designed" when the agent is chosen based on the configuration of the particular protein. For example, one 
skilled in the art can readily adapt currently available procedures to generate peptides, pharmaceutical agents and the 
like capable of binding to a specific peptide sequence in order to generate rationally designed antipeptide peptides, 
for example see Hurby ef a/., Application of Synthetic Peptides: Antisense Peptides," In Synthetic Peptides, A User's 
Guide, W. H. Freeman, NY (1992), pp. 289-307, and Kaspczakera/., Biochemistry 28:9230-8 (1 989), or pharmaceutical 
agents, or the like. 

In addition to the foregoing, one class of agents of the present invention, as broadly described, can be used to 
control gene expression through binding to one of the ORFs or EMFs of the present invention. As described above, 
such agents can be randomly screened or rationally designed/selected. Targeting the ORF or EMF allows a skilled 
artisan to design sequence specific or element specific agents, modulating the expression of either a single ORF or 
multiple ORFs which rely on the same EMF for expression control. 

One class of DNA binding agents are agents which contain base residues which hybridize or form a triple helix by 
binding to DNA or RNA. Such agents can be based on the classic phosphodiester, ribonucleic acid backbone, or can 
be a variety of sulfhydryl or polymeric derivatives which have base attachment capacity. 

Agents suitable for use in these methods usually contain 20 to 40 bases and are designed to be complementary 
to a region of the gene involved in transcription (triple helix - see Lee era/., Nucl. Acids Res. 6:3073 (1979); Cooney 
et al, Science 241 :456 (1 988); and Dervan ef ai, Science 251 : 1 360 (1 991 )) or to the mRNA itself (antisense - Okano, 
J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca 
Raton, FL (1988)). Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antisense 
RNA hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated 
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to be effectiv in model systems. Information contained in the sequences of the present invention can be used to design 
antisense and triple helix-forming oligonucleotides, and other DNA binding agents. 

5. Pharmaceutical C mpositions and Vaccln s 

The present invention further provides pharmaceutical agents which can be used to modulate the growth or path- 
ogenicity of Staphylococcus aureus, or another related organism, in vivo or in vitro. As used herein, a "pharmaceutical 
agent - is defined as a composition of matter which can be formulated using known techniques to provide a pharma- 
ceutical compositions. As used herein, the "pharmaceutical agents of the present invention" refers the pharmaceutical 
agents which are derived from the proteins encoded by the ORFs of the present invention or are agents which are 
identified using the herein described assays. 

As used herein, a pharmaceutical agent is said to "modulate the growth or pathogenicity of Staphylococcus aureus 
or a related organism, in vivo or in vitro, " when the agent reduces the rate of growth, rate of division, or viability of the 
organism in question. The pharmaceutical agents of the present invention can modulate the growth or pathogenicity 
of an organism in many fashions, although an understanding of the underlying mechanism of action is not needed to 
practice the use of the pharmaceutical agents of the present invention. Some agents will modulate the growth or path- 
ogenicity by binding to an important protein thus blocking the biological activity of the protein, while other agents may 
bind to a component of the outer surface of the organism blocking attachment or rendering the organism more prone 
to act the bodies nature immune system. Alternatively, the agent may comprise a protein encoded by one of the ORFs 
of the present invention and serve as a vaccine. The development and use of vaccines derived from membrane asso- 
ciated polypeptides are well known in the art. The inventors have identified particularly preferred immunogenic Sta- 
phylococcus aureus polypeptides for use as vaccines. Such immunogenic polypeptides are described above and sum- 
marized in Table 4, below. 

As used herein, a "related organism" is a broad term which refers to any organism whose growth or pathogenicity 
can be modulated by one of the pharmaceutical agents of the present invention. In general, such an organism will 
contain a homolog of the protein which is the target of the pharmaceutical agent or the protein used as a vaccine. As 
such, related organisms do not need to be bacterial but may be fungal or viral pathogens. 

The pharmaceutical agents and compositions of the present invention may be administered in a convenient man- 
ner, such as by the oral, topical, intravenous, intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal 
routes. The pharmaceutical compositions are administered in an amount which is effective- for treating and/or proph- 
ylaxis of the specific indication. In general, they are administered in an amount of at least about 1 mg/kg body weight 
and in most cases they will be administered in an amount not in excess of about 1 g/kg body weight per day. In most 
cases,, the dosage is from about 0.1 mg/kg to about 10 g/kg body weight daily, taking into account the routes of ad- 
ministration, symptoms, etc. 

The agents of the present invention can be used in native form or can be modified to form a chemical derivative. 
As used herein, a molecule is saidto be a "chemical derivative" of another molecule when it contains additional chemical 
moieties not normally a part of the molecule. Such moieties may improve the molecule's solubility, absorption, biological 
half life, etc The moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any undesirable 
side effect of the molecule, etc. Moieties capable of mediating such effects are disclosed in, among other sources, 
REMINGTON'S PHARMACEUTICAL SCIENCES (1 980) cited elsewhere herein. 

For example, such moieties may change an immunological character of the functional derivative, such as affinity 
for a given antibody. Such changes in immunomodulation activity are measured by the appropriate assay, such as a 
competitive type immunoassay. Modifications of such protein properties as redox or thermal stability, biological half- 
life, hydrophobicity, susceptibility to proteolytic degradation or the tendency to aggregate with carriers or into multimers 
also may be effected in this way and can be assayed by methods well known to the skilled artisan. 

The therapeutic effects of the agents of the present invention may be obtained by providing the agent to a patient 
by any suitable means (e.g., inhalation, intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is 
preferred to administer the agent of the present invention so as to achieve an effective concentration within the blood 
or tissue in which the growth of the organism is to be controlled. To achieve an effective blood concentration, the 
preferred method is to administer the agent by injection. The administration may be by continuous infusion, or by single 
or multiple injections. 

In providing a patient with one of the agents of the present invention, the dosage of the administered agent will 
vary depending upon such factors as the patient's age, weight, height, sex, general medical condition, previous medical 
history, etc. In general, it is desirable to provide the recipient with a dosage of agent which is in the range of from about 
1 pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage may be administered. The therapeu- 
tically effective dose can be lowered by using combinations of the agents of the present invention or another agent. 

As used herein, two or more compounds or agents are said to be administered "in combination" with each other 
when either (1) the physiological effects of each compound, or (2) the serum concentrations of each compound can 
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be measured at the same time. The composition of the present invention can be administered concurrently with, prior 
to, or following the administration of the other agent. 

The agents of the present invention are intended to be provided to recipient subjects in an amount sufficient to 
decrease the rate of growth (as defined above) of the target organism. 

5 The administration of the agent(s) of the invention may be for either a "prophylactic" or "therapeutic" purpose. 

When provided prophy fact tea lly, the agent(s) are provided in advance of any symptoms indicative of the organisms 
growth. The prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the rate of onset of 
any subsequent infection. When provided therapeutically, the agent(s) are provided at (or shortly after) the onset of an 
indication of infection. The therapeutic administration of the compound(s) serves to attenuate the pathological symp- 

10 toms of the infection and to increase the rate of recovery. 

The agents of the present invention are administered to a subject, such as a mammal, or a patient, in a pharma- 
ceutically acceptable form and in a therapeutically effective concentration. A composition is said to be "pharmacolog- 
ically acceptable" if its administration can be tolerated by a recipient patient. Such an agent is said to be administered 
in a "therapeutically effective amount" if the amount administered is physiologically significant. An agent is physiolog- 

is ically significant if its presence results in a detectable change in the physiology of a recipient patient. 

The agents of the present invention can be formulated according to known methods to prepare pharmaceutical^ 
useful compositions, whereby these materials, or their functional derivatives, are combined in admixture with a phar- 
maceutical^ acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of other human proteins, e. 
g., human serum albumin, are described, for example, in REMINGTON'S PHARMACEUTICAL SCIENCES, 16 th Ed., 

20 osol, A., Ed., Mack Publishing, Easton PA (1 980). In order to form a pharmaceutical^ acceptable composition suitable 
for effective administration, such compositions will contain an effective amount of one or more of the agents of the 
present invention, together with a suitable amount of carrier vehicle. 

Additional pharmaceutical methods may be employed to control the duration of action. Control release preparations 
may be achieved through the use of polymers to complex or absorb one or more of the agents of the present invention. 

2S The controlled delivery may be effectuated by a variety of well known techniques, including formulation with macro- 
molecules such as, for example, polyesters, polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcel- 
lulose, carboxymethylcellulose, or protamine, sulfate, adjusting the concentration of the macromolecules and the agent 
in the formulation, and by appropriate use of methods of incorporation, which can be manipulated to effectuate a desired 
time course; of release. Another possible method to control the duration of action by controlled release preparations is 

30 to incorporate; agents of the present invention into particles of a polymeric material such as polyesters, polyamino, 
acids, hydrogels, poly (lactic acid) or ethylene vinylacetate copolymers. Alternatively, instead of incorporating these 
agents into polymeric particles, it is possible to entrap these materials in microcapsules prepared, for example, by 
coacervation^echniques or by interfacial polymerization with, for example, hydroxymethylcellulose or gelatine-micro- 
capsules and poly(methylmethacylate) microcapsules, respectively, or in colloidal drug delivery systems, for example, 

-35 liposomes;>aibumin microspheres, microemulsions, nanoparticles, and nanocapsules or in macroemulsions. Such tech- 
niques are disclosed in REMINGTON'S PHARMACEUTICAL SCIENCES (1980). 

The invention further provides a pharmaceutical pack or kit comprising one or more containers filled with one or 
more of the ingredients of the pharmaceutical compositions of the invention. Associated with such containers) can be 
a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals 

40 or biological products, which notice reflects approval by the agency of manufacture, use or sale for human adminis- 
tration. 

In addition, the agents of the present invention may be employed in conjunction with other therapeutic compounds. 
6. Shot-Gun Approach to Megabase DNA Sequencing 

45 

The present invention further demonstrates that a large sequence can be sequenced using a random shotgun 
approach. This procedure, described in detail in the examples that follow, has eliminated the up front cost of isolating 
and ordering overlapping or contiguous subclones prior to the start of the sequencing protocols. 

Certain aspects of the present invention are described in greater detail in the examples that follow. The examples 
50 are provided by way of illustration. Other aspects and embodiments of the present invention are contemplated by the 
inventors, as will be clear to those of skill in the art from reading the present disclosure. 
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ILLUSTRATIVE EXAMPLES 

LIBRARIES AND SEQUENCING 

1. Shotgun Sequ n Ing Probability Analysis 



/L«J2^!2 h T , W a Shot9un a PP f0ach to 9°"°™ sequencing follows from the Lander and Waterman 
Undermanand Waterman i. Genomes 2: 231 ( 1988)) application of the equation for the Poisson distribution Accord^q 
to this treatment the probab.lity. P 0 , that any given base in a sequence of size L, in nucleotides, is no, sequenced 22 
m l Unt ' ?• nucleo1ides ' ot random sequence has been determined can be calculated by th^equatton P 

hJT ^ T " ^ fo,d C ° Vera9e -" FOf inStanCe - '° r 3 Qeno ™ of 2 8 Mb - -"=1 ^en 2 8 Mb o sequence has 
been randomly generated (1X coverage). At that point, P 0 = e-i = 0.37. The probability that any given base has no' 
been sequenced ,s the same as the probability that any region of the whole sequence L has not been deteied and 
therefore, ^equ^enttothe^ 

approximately 37% of a po.ynucleotide of size L. in nucleotides has not been sequenced. When 1 4 Mb of sequence 

of a 2.8 Mb sequence can be attained by sequencing approximately 17.000 random clones from both insert end! wrth 
an average sequence read length of 41 0 bp. n 

Similarly, the total gap length. G, is determined by the equation G = Le-». and the average gap size g follows the 
SS^S Mb Ion* 5X COV6ra9e ' eaVeS ^ ^ *** aVera9in9 abOUt 82 bP h *» h • -quen^J olTp^ 

The treatment above is essentially that of Lander and Waterman. Genomics 2: 231 (1988). 



2. Random Library Construction 



In order to approx.mate the random model described above during actual sequencing, a nearly ideal library of 

<mn ^ ^ , aUmUS n ° NA WM Prepafed bV Phen °' e * rac «™- A mixture containing 600 ug DNA in 3 3 ml of 
M^ 4SnT m a r ,at r' 10 , mM Tris - HCI ' 1 mM Na -^TA. 30% glycerol was sonicated for 1 min. £ 0'C in a Bra^ 

iT^^t^:krz en6m semng usins a 3 mm probe - - dna *- — p~*ss 

To create blunt-ends, a 1 00 ul aliquot of the resuspended DNA was digested with 5 units of BAL31 nuclease (New 

cpitated redissolved in 100 ul TE buffer, and then size«onated by electrophoresis through a 1 0% low meK 
emperature agarose gel. The section containing DNA fragments 1 .6-2.0 Kb in size was excised from the he 
LGT agarose was melted and the resulting solution was extracted with phenol. to separate the agarose fZ he DNA 
DNA was ethanol precipitated and redissolved in 20 ul of TE buffer for ligation to vector 

in« A J W rt" S, f eP t"r at v n prOGedure was used to P roduce a P" 38 ™* library with 97% inserts, of which >99% were single 
inserts. The first ligation mixture (50 ul) contained 2 ug of DNA fragments. 2 ug pUCIS DNA (Pharmacia) cut w rthTm^ 

J l^oTfh?^ T T r Hne ph ° Spha,ase ' and 10 «"»■ of T^gase ^.bSSSJSS £££ 
h k . ? " 9at ' 0n m,XtUre then Was pheno1 e)rtra <=ted and ethanol precipitated, and the precipitated DNA 
was dissolved ,n 20 ul TE buffer and electrophoresed on a 1 .0% low melting agarose gel. Discrete ban* in T\ addeC 
were visualized by ethidium bromide-staining and UV illumination and identify s ize as insert m^ZiT^ 

^ . 1 A ,h6n WaS blun, - ended b V T4 Polymerase treatment for 5 min. at 37° C in a reaction mixture 

{ ZllTT m9 « « r:" nearS : 50 ° " M 6aCh °' me 4 dNTPs " and 9 units ° f T4 Polymerase (New England BioUbsT 
under recommended buffer conditions. After phenol extraction and ethanol precipitation the repaired v + i linears were 

JS^, ? ^ T ?, e fina ' ,i9a,i0n t0 Pf0dUCe CirCl6S W3S c "" tad ° Ut in a 50 ul reaction^ta^rsToTv: 
stored a, 20-C ^ ' ° Vemi9ht ' 1 ° * 7 °°° ^ ,0 " 0Win9 da * the reaction was 

mini™? ,W °- s,a 9 e ,P ro f dure resu,ted j " a molecularly random collection of single-insert plasmid recombinants with 
minimal contamination from double-insert chimeras (<1 %) or free vector (<3%) 

Since deviation from randomness can arise from propagation the DNA in the host. Eco/Zhost cells deficient in all 
H^riT h f «f"<*on functions (A Greener, Strategies 3 (1):5 (1 990)) were used to prevent rearrangements 
nrJtiTnlniH h SS V reS,r ' C,i0n - Furthermore ' transformed cells were plated directly on antibiotic diffusion 

plates toavoid the usual broth recovery phase which allows multiplication and selection of the most rapidly growing ce Is 

Plating was carried out as follows. A 100 u. aliquot of Epicurian Coli SURE II Supercompetent CeHs (S.ratagene 
200152) was thawed on ice and transferred to a chilled Falcon 2059 tube on ice. A 1.7 ul aliquot of 1 42 M beta 
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mercaptoethanol was added to the aliquot of cells to a final concentration of 25 mM. Cells were incubated on ice for 
10 min. A 1 ul aliquot of the final ligation was added to the cells and incubated on ice for 30 min. The cells were heat 
pulsed for 30 sec. at 42° C and placed back on ice for 2 min. The outgrowth period in liquid culture was eliminated 
from this protocol in order to minimize the preferential growth of any given transformed cell. Instead the transformation 

5 mixture was plated directly on a nutrient rich SOB plate containing a 5 ml bottom layer of SOB agar (5% SOB agar: 
20 g tryptone, 5 g yeast extract, 0.5 g NaCI, 1 .5% Difco Agar per liter of media). The 5 ml bottom layer is supplemented 
with 0.4 ml of 50 mg/ml ampicillin per 100 ml SOB agar. The 15 ml top layer of SOB agar is supplemented with 1 ml 
X-Gal (2%), 1 ml MgCI 2 (1 M), and 1 ml MgSO 4 /100 ml SOB agar. The 15 ml top layer was poured just prior to plating. 
Our titer was approximately 100 colonies/10 ul aliquot of transformation. 

10 All colonies were picked for template preparation regardless of size. Thus, only clones lost due to "poison" DNA 

or deleterious gene products would be deleted from the library, resulting in a slight increase in gap number over that 
expected. 

3. Random DNA Sequencing 

15 

High quality double stranded DNA plasmid templates were prepared using an alkaline lysis method developed in 
collaboration with 5Prime -» 3Prime Inc. (Boulder, CO). Plasmid preparation was performed in a 96-well format for all 
stages of DNA preparation from bacterial growth through final DNA purification. Average template concentration was 
determined by running 25% of the samples on an agarose gel. DNA concentrations were not adjusted. 

20 Templates were also prepared from a Staphylococcus aureus lambda genomic library. An unamplified library was 

constructed in Lambda DASH II vector (Stratagene). Staphylococcus aureus DNA (> 100 kb) was partially digested in 
a reaction mixture (200 ul) containing 50 ug DNA, IX Sau3AI buffer, 20 units Sau3AI for 6 min. at 23 C. The digested 
DNA was phenol-extracted and centrifuges over a 10- 40% sucroce gradient. Fractions containing genomic DNA of 
15-25 kb were recovered by precipitation . One ul of fragments was used with 1 ui of DASHII vector (Stratagene) in 

2$ the recommended ligation reaction. One ul of the ligation mixture was used per packaging reaction following the rec- 
ommended protocol with the Gigapack II XL Packaging Extract Phage were plated directly without amplification from 
the packaging mixture (after dilution with 500 ul of recommended SM buffer and chloroform treatment). Yield was about 
2.5x1 0 9 pfu/ul. 

An amplified library was prepared from the primary packaging mixture according to the manufacturer's protocol. 

30 The amplified library is stored frozen in 7% dimethy (sulfoxide. The phage titer is approximately 1x10 s pfu/ml. 

Mini-liquid lysates (0.1 ul) are prepared from randomly selected plaques and template is prepared by long range 
PCR. Samples are PCR amplified using modified T3 and T7 primers, and Elongase Supermix (LTI). 

Sequencing reactions are carried out on plasmid templates using a combination of two workstations (BIOMEK 
1000 and Hamilton Microlab 2200) and the Perkin-Elmer 9600 thermocycler with Applied Biosystems PRISM Ready 

35 Reaction Dye Primer Cycle Sequencing Kits for the M1 3 forward (M1 3-21) and the M13 reverse (M13RP1) primers. 
Dye terminator sequencing reactions are carried out on the lambda templates on a Perkin-Elmer 9600 Thermocycler 
using the Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. Modified T7 and T3 primers are 
used to sequence the ends of the inserts from the Lambda DASH II library. Sequencing reactions are on a combination 
of AB 373 DNA Sequencers and ABI 377 DNA sequencers. All of the dye terminator sequencing reactions are analyzed 

40 using the 2X 9 hour module on the AB 377. Dye primer reactions are analyzed on a combination of ABI 373 and ABI 
377 DNA sequencers. The overall sequencing success rate very approximately is about 85% for M1 3-21 and M1 3RP1 
sequences and 65% for dye -terminator reactions. The average usable read length is 485 bp for M13-21 sequences, 
445bp for M1 3RP1 sequences, and 375 bp for dye-terminator reactions. 

45 4, Protocol for Automated Cycle Sequencing 

The sequencing was carried out using Hamilton Microstation 2200, Perkin Elmer 9600 thermocyclers, ABI 373 
and ABI 377 Automated DNA Sequencers. The Hamilton combines pre-aliquoted templates and reaction mixes con- 
sisting of deoxy- and dideoxynucleotides, the thermostable Taq DNA polymerase, fluorescently-labelled sequencing 
so primers, and reaction buffer. Reaction mixes and templates were combined in the wells of a 96-well thermocycling 
plate and transferred to the Perkin Elmer 9600 thermocycler. Thirty consecutive cycles of linear amplification (i.e ., one 
primer synthesis) steps were performed including denaturation, annealing of primer and template, and extension; i.e., 
DNA synthesis. A heated lid with rubber gaskets on the thermocycling plate prevents evaporation without the need for 
an oil overlay. 

55 Two sequencing protocols were used: one for dye-labelied primers and a second for dye-labelled dideoxy chain 

terminators. The shotgun sequencing involves use of four dye-labelled sequencing primers, one for each of the four 
terminator nucleotide. Each dye-primer was labelled with a different fluorescent dye, permitting the four individual 
reactions to be combined into one lane of the 373 or 377 DNA Sequencer for electrophoresis, detection, and base- 
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calling. ABI currently supplies premixed reaction mixes in bulk packages containing all the necessary non-template 
reagents for sequencing. Sequencing can be done with both plasmid and PCR-generated templates with both dye- 
prim rs and dye- terminators with approximately equal fidelity, although plasmid templates generally give longer usable 
s quences. 

s Thirty-two reactions were loaded per ABI 373 Sequencer each day and 96 samples can be loaded on an ABI 377 

per day. Electrophoresis was run overnight (ABI 373) or for 2 1/2 hours (ABI 377) following the manufacturer's protocols. 
Following electrophoresis and fluorescence detection, the ABI 373 or ABI 377 performs automatic lane tracking and 
base-calling. The lane-tracking was confirmed visually. Each sequence electropherogram (or fluorescence lane trace) 
was inspected visually and assessed for quality. Trailing sequences of low quality were removed and the sequence 

10 itself was loaded via software to a Sybase database (archived daily to 8mm tape). Leading vector polylinker sequence 
was removed automatically by a software program. Average edited lengths of sequences from the standard ABI 373 
or ABI 377 were around 400 bp and depend mostly on the quality of the template used for the sequencing reaction. 

INFORMATICS 

15 

1 . Data Management 

A number of information management systems for a large-scale sequencing lab have been developed. (For review 
see, for instance, Kerlavage era/., Proceedings of the Twenty-Sixth Annual Hawaii international Conference on System 

20 Sciences, IEEE Computer Society Press, Washington D. C, 585 (1993)) The system used to collect and assemble 
the sequence data was developed using the Sybase relational database management system and was designed to 
automate data flow where ever possible and to reduce user error. The database stores and correlates all information 
collected during the entire operation from template preparation to final analysis of the genome. Because the raw output 
of the ABI 373 Sequencers was based on a Macintosh platform and the data management system chosen was based 

25 on a Unix platform, it was necessary to design and implement a variety of multi- user, client-server applications which 
allow the raw data as well as analysis results to flow seamlessly into the database with a minimum of user effort. 

2. Assembly 

30 An assembly engine (TIGR Assembler) developed for the rapid and accurate assembly of thousands of sequence 

fragments was enployed to generate contigs. The TIGR assembler simultaneously clusters and assembles fragments 
of the genome. In order to obtain the speed necessary to assemble more than 10 4 fragments, the algorithm builds a 
hash table of 12 bp oligonucleotide subsequences to generate a list of potential sequence fragment overlaps. The 
number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. 

35 Beginning with a single seed sequence fragment, TIGR Assembler extends the current contig by attempting to add 
the best matching fragment based on oligonucleotide content. The contig and candidate fragment are aligned using a 
modified version of the Smith-Waterman algorithm which provides for optimal gapped alignments (Waterman, M. S., 
Methods in Enzymology 1 64 : 765 (1988)). The contig is extended by the fragment only if strict criteria for the quality 
of the match are met. The match criteria include the minimum length of overlap, the maximum length of an unmatched 

40 end, and the minimum percentage match. These criteria are automatically lowered by the algorithm in regions of minimal 
coverage and raised in regions with a possible repetitive element. The number of potential overlaps for each fragment 
determines which fragments are likely to fall into repetitive elements. Fragments representing the boundaries of repet- 
itive elements and potentially chimeric fragments are often rejected based on partial mismatches at the ends of align- 
ments and excluded from the current contig. TIGR Assembler is designed to take advantage of clone size information 

45 coupled with sequencing from both ends of each template. It enforces the constraint that sequence fragments from 
two ends of the same template point toward one another in the contig and are located within a certain ranged of base 
pairs (definable for each clone based on the known clone size range for a given library). 

3. Identifying Genes 

60 

The predicted coding regions of the Staphylococcus aureus genome were initially defined with the program zorf, 
which finds ORFs of a minimum length. The predicted coding region sequences were used in searches against a 
database of all Staphylococcus aureus nucleotide sequences from Gen Bank (release 92.0), using the BLASTN search 
method to identify overlaps of 50 or more nucleotides with at least a 95% identity. Those ORFs with nucleotide sequence 
55 matches are shown in Table 1 . The ORFs without such matches were translated to protein sequences and and com- 
pared to a non-redundant database of known proteins generated by combining the Swiss-prot, PIR and GenPept 
databases. ORFs of at least 80 amino acids that matched a database protein with BLASTP probability less than or 
equal to 0.01 are shown in Table 2. The table also lists assigned functions based on the closest match in the databases. 
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ORFs of at least 120 amino acids that did not match protein or nucleotide sequences in the databases at these levels 
are shown in Table 3. 

ILLUSTRATIVE APPLICATIONS 

5 

1. Production of an Antibody to a Staphylococcus aureus Protein 

Substantially pure protein or polypeptide is isolated from the transfected or transformed cells using any one of the 
methods known in the art. The protein can also be produced in a recombinant prokaryotic expression system, such as 
10 £. colt, or can by chemically synthesized. Concentration of protein in the final preparation is adjusted, for example, by 
concentration on an Amicon fitter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the 
protein can then be prepared as follows. 

2. Monoclonal Antibody Production by Hybrldoma Fusion 

Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from 
murine hybridomas according to the classical method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or modifi- 
cations of the methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein 
over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated. 
20 The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells 
destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused 
cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. 
Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay 
procedures, such as ELISA, as originally described by Engvall, E., Meth. Enzymol. 70:419 (1980), and modified metri- 
cs ods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. 
Detailed procedures for monoclonal antibody production are described in Davis, L. et al Basic Methods in Molecular 
Biology Elsevier, New York. Section 21-2 (1989). 

3. Polyclonal Antibody Production by Immunization 

30 

Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein can be prepared by im- 
munizing suitable animals with the expressed protein described above, which can be unmodified or modified to enhance 
immunogenictty. Effective polyclonal antibody production is affected by many factors related both to the antigen and 
the host species. For example, small molecules tend to be less immunogenic than other and may require the use of 

3S carriers and-adjuvant. Also, host animals vary in response to site of inoculations and dose, with both inadequate or 
excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigenadministered at multiple 
intradermal sites appears to be most reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis, 
J. et at., J. Clin. Endocrinol. Metab. 33:988-991 (1971). 

Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as de- 

40 termined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the 
antigen, begins to fall. See, for example, Ouchterlony, O. et al., Chap. 19 in:Handbook of Experimental Immunology, 
Wier, D., ed, Blackwell (1973). Plateau concentration of antibody is usually in the range of 0. 1 to 0. 2 mg/ml of serum 
(about 12M). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, 
for example, by Fisher, D., Chap. 42 in:Manual of Clinical Immunology, second edition, Rose and Friedman, eds., Amer. 

45 Soc. For Microbiology, Washington, D. C. (1980) 

Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which de- 
termine concentrations of antigen-bearing substances in biological samples; they are also used semi- quantitatively 
or qualitatively to identify the presence of antigen in a biological sample. In addition, they are useful in various animal 
models of Staphylococcal disease known to those of skill in the art as a means of evaluating the protein used to make 

so the antibody as a potential vaccine target or as a means of evaluating the antibody as a potential immunothereapeutic 
reagent. 

3. Preparation of PCR Primers and Amplification of DNA 

55 Various fragments of the Staphylococcus aureus genome, such as those of Tables 1 -3 and SEQ ID NOS: 1 -5,1 91 

can be used, in accordance with the present invention, to prepare PCR primers for a variety of uses. The PCR primers 
are preferably at least 15 bases, and more preferably at least 18 bases in length. When selecting a primer sequence, 
it is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approxi- 
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mately the same. Th PCR primers and amplified DNA of this Example find use in the Examples that follow. 
4. G ne expression from DNA S quenc s Corresponding to ORFs 

s A fragment of the Staphylococcus aureus genome provided in Tables 1 -3 is introduced into an expression vector 

using conventional technology. Techniques to transfer cloned sequences into expression vectors that direct protein 
translation in mammalian, yeast, insect or bacterial express ton systems are well known in the art. Commercially avail- 
able vectors and expression systems are available from a variety of suppliers including Stratagene (La Jolla, California), 
Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and facilitate 

10 proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular ex- 
pression organism, as explained by Hatfield era/., U. S. Patent No. 5,082,767, incorporated herein by this reference. 

The following is provided as one exemplary method to generate polypeptide(s) from cloned ORFs of the Staphy- 
lococcus aureus genome fragment. Bacterial ORFs generally lack a poly A addition signal. The addition signal sequence 
can be added to the construct by, for example, splicing out the poly A addition sequence from pSG5 (Stratagene) using 

*5 Bgll and Sail restriction endonuclease enzymes and incorporating it into the mammalian expression vector pXT1 (Strat- 
agene) for use in eukaryotic expression systems. pXT1 contains the LTRs and a portion of the gag gene of Moloney 
Murine Leukemia Virus. The positions of theLTRs in the construct allow efficient stable transfection. The vector includes 
the Herpes Simplex thymidine kinase promoter and the selectable neomycin gene. The Staphylococcus aureus DNA 
is obtained by PCR from the bacterial vector using oligonucleotide primers complementary to the Staphylococcus 

20 aureus DNA and containing restriction endonuclease sequences for Pstl incorporated into the 5' primer and Bgll I at 
. the 5* end of the corresponding Staphylococcus aureus DNA 3* primer, taking care to ensure that the Staphylococcus 
aureus DNA is positioned such that its followed with the poly A addition sequence. The purified fragment obtained from 
the resulting PCR reaction is digested with Pstl, blunt ended with an exonuclease, digested with Bgll I, purified and 
ligated to pXT1, now containing a poly A addition sequence and digested Bgllf. 

2£ The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, 

New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the 
transfected cells in 600 ug/ml G41 8 (Sigma, St. Louis, Missouri). The protein is preferably released into the supernatant. 
However if the protein has membrane binding domains, the protein may additionally be retained within the cell or 
expression may be restricted to the cell surface. Since it may be necessary to purify and locate the transfected product, 

30 synthetic 15-mer peptides synthesized from the predicted Staphyfococcus aureus DNA sequence are injected into 
mice to generate antibody to the polypeptide encoded by the Staphylococcus aureus DNA. 

Alternately and if antibody production is not possible, the Staphylococcus aureus DNA sequence is additionally 
incorporated into eukaryotic expression vectors and expressed as, for example, a globin fusion. Antibody to the globin 
moiety then is used to purify the chimeric protein. Corresponding protease cleavage sites are engineered between the 

3S globin moiety and the polypeptide encoded by the Staphylococcus aureus DNA so that the latter may be freed from 
the formed by simple protease digestion. One useful expression vector for generating globin chimerics is pSG5 (Strat- 
agene). This vector encodes a rabbit globin. Intron II of the rabbit globin gene facilitates splicing of the expressed 
transcript, and the poly ad eny I at ion signal incorporated into the construct increases the. level of expression. These 
techniques are well known to those skilled in the art of molecular biology. Standard methods are published in methods 

40 texts such as Davis era/., cited elsewhere herein, and many of the methods are available from the technical assistance 
representatives from Stratagene, Life Technologies, Inc., or Promega. Polypeptides of the invention also may be pro- 
duced using in vitro translation systems such as in vitro ExpressTM Translation Kit (Stratagene). 

While the present invention has been described in some detail for purposes of clarity and understanding, one 
skilled in the art will appreciate that various changes in form and detail can be made without departing from the true 

45 scope of the invention. 

All patents, patent applications and publications referred to above are hereby incorporated by reference. 
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Table 4 



ORE 




Rl AST 




Reaions 






i 






R #»n Inn 1 


Region 2 


Region 3 


Region 4 


1 £A C 


HI Q? 


lipoprotein 


^fi-4 5 


84-103 


1 52-161 


1 76-185 


9Qft 1 




chrA 


7 1 -3Q 

C 1 O 7 


48-58 


84-95 


232-249 


C1 "5 

5 l_Z 


5 i 94 


OppB gene product (B. sub" 






1 00-1 1 2 


121-131 


278_3 


51 95 


lipoprotein 1 






85-97 


1 62-1 71 


276_2 


51 96 


lipoprotein 






1 77-1 Rfi 


C 1 1 "CC\J 


45_4 


5197 


ProX t 


ca-3 7 


5Q.CQ 


ft 5-1 nn 


1 tU* 1 1 27 


315.8 


5198 


hypothetical protein 


AC C Vt 

45-54 


QQ.Q7 




C *t j" C D 0 


1S4_1S 


5199 


•unknown ' 


0 1 Af\ 

6 l -4U 




7Q-RR 




22B_3 


5200 


! unknown J 


25-38 






OvTOj 


228_6 


S201 


! unknown i 


*\ ft A 1 

29-41 


03* 1 V 1 


1 to - 1 TJ 


1 f J- 1 04 


S0_1 


5202 


: unknown ! 


21-33 




1 too- 1 oc 


1 07 OftC 


112.7 


5203 


■iron-bindinq periplasmic ; 


21-31 


5o-©f 




111-1 ?n 


442.1 


5204 


.unknown ' 


30*39 


91-1 OO 






66.2 


5205 


: unknown 


50-59 


1 04-1 1 6 


1 1 7-1 06 


1 o7- 1 62 


i 304_2 


5206 


! 0-bindinq periplasmic i 


19-28 


48-57 


75-84 


103-1 16 


: 44_1 


5207 


1 hypothetical protein 


27-36 


86-95 


1 Oft i 0 0 

1 29-1 38 


1 92-201 


161_4 


5208 


: SphX 


27-44 


1 49-161 


ICC 1 "7 C 

1 00- 1 75 


tOl -tl O 


46_5 


S209 


xmpC (permease) 


21-33 


61-70 


83-32 


1 00-109 


942.1 


5210 


;traH [Plasmid pSK41 ] ! 


83-92 


109-1 18 


1 27-142 




5_4 


5211 


10RF (S. aureus) ! 


12-22 


87-96 


111-1 20 


151-1 60 


20_4 


5212 


peptidoglycan hydrolase (5.' 


24-34 


1 29-138 


141-1 50 


1 61-1 71 


328_2 


5213 


i lipoprotein (H. flu) i 


81-90 


123-133 


290-299 




S20_2 


5214 


i fibronectin binding protein : 


44-54 


63-79 


81-90 


95-110 


771.1 


5215 


'emml gene product (S. py< 


30-39 


65-82 


96-106 


112-121 


999_1 


5216 


• predicted trithorax prot. (0 


7-16 


1 20-1 29 


157-166 




853_1 


S217 


ORF2136 (Marchantia polyr 


43-52 


88-97 


102-111 




287.1 


5218 


psaA homolog 


13-22 


28-44 


72-82 


114-124 


288.2 


5219 


cell wall enzyme 


14-23 


89-98 






596 2 


5220 


penicillin binding protein 2b 


40-49 


S9-68 


; 76-87 


106-115 


217.5 


5221 


fibronectin/fibrinogen bindii 


28-37 


40-49 


62-71 


1 93-111 


217.6 


S222 


fibronectin/fibrinogen bp 


1 0-1 9 


31-40 


54-62 


i 73-92 


528.3 


5223 


myosin cross reactive prote 


4-1 3 


29-47 


60-73 


; 90-99 


171.1 1 


5224 


EF 


20-31 


! 91-110 


i 


63.4 


5225 


• penicillin binding protein 2b 


1 CrC 1 


1 J3 00 


95-104 ! 


35o.c 


C99C 

DC CO 




46-55 


1 62-71 




743_. ■ 


Z>CCf 


KUa protein in iim^ icyi' 


23-32 


68-79 


i 94-103 


175-184 




DCCO 


1 Wlicniny rriuuitty 


10-19 


48-60 


83-92 


111-121 


69.3 


i 5229 


arabinogalactan protein 


97-106 


132-141 


i 158-167 


180-189 


70.6 


; 5230 


nodulin 


36-45 


48-57 


■ 137-160 


179-188 


129.2 


i 5231 


glycerol diester phosphodie 


8-17 


41-50 


i 55-74 


97-106 


58_5 


: 5232 


PBP (S. aureus) 


26-35 


70-79 


1 1 7-1 26 


152-161 


188.3 


5233 


MHC class II analog (S. aure 


72-81 


94-103 


115-124 


136-145 


236.6 


: 5234 


histidine kinase domain (Die 


24-33 


52-67 


81-94 


106-121 


310.8 


5235 


clumping factor (S. aureus) 


59-71 


77-86 


: 93-102 


5 118-127 


601.1 


5236 


novel antigen/0RF2 (S. aui 


45-54 


91-104 


108-117 


186-195 


544.3 


5237 


ORF YJR1 5 1 c (S. cerevisae] 


76-90 


101-111 


131-140 


154-164 


662.1 


5238 


MHC class II analog IS. aure 


22-32 


71-80 


89-98 


114-122 


87.7 


5239 


5' nucleotidase precursor (' 


29-45 


62-71 


: 105-114 


125-137 


120.1 


5240 


B65G qene product (B. sub 


102-111 
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Table 4 



ORF 




Antigenic 


Regions 


(cont) 








Region 5 


Region 6 


Region 7 


Region 8 


Region 9 


Region 10 


168.6 


244-272 


303-315 




t 






238.1 


260-269 


291-301 


308-317 I J 


51.2 


140-152 


188-208 


211 -220 


i 256-266 


273-283 




278.3 


198-209 






1 


276.2 


255-268 


• 




1 i 


45.4 


177-199 


221-230 


234-243 


268-279 


284-293 


304-313 


316_8 , 






154.15 


148-157 


177-187 


202-211 




i 


228.3 


101-119 


. 139-154 


166-181 






228.6 








i 


50.1 












112.7 


136-149 


197-211 


218-229 


253-273 






442.1 


199-210 


; 247-257 


264-277 


287-309 






66.2 












304.2 


178-187 


250-259 










44.1 












161.4 








i 


46.5 


131-141 


• 162-176 


206-215 


243-252 


264-273 


- 285-294 


942.1 












5.4 


1 89-205 


230-239 


246-264 


301-318 


340-354 


378-387 


20.4 


202-212 


217-234 


260-275 


314-336 


366-373 


380-391 


328.2 








j 


520.2 










771.1 


145-154 










999.1 i 








853.1 1 




! i 


287.1 


154-164 






I I 


288.2 1 




t 




596.2 


121-1 30 










217.5 


244-253 


259-268 


288-297 


302-311 | 




217.6 


144-1 56 


174-183 


188-197 


207-216 


226-242 




528.3 








171.11 




i ! 


63.4 t 






i 
1 


3S3_2 ! 






j 


74?.1 


1 97-207 








i 


342_4 ; 


1 J i 


69.3 


195-211 




l 


70.6 


206-215 


263-272 


291-301 


331-340 


358-371 


390-414 


129.2 


117-127 


141-157 


168-183 


202-21 1 


222-231 


261-270 


58.5 


1 84-203 


260-269 


275-299 


330-344 


372-381 


424-433 


188.3 i i 


236.6 


138-147 


163-172 


187-198 


i 244-261 


268-278 


308-317 


310.8 


131-140 


144-153 


177-186 


! 190-199 


» 204-21 3 


216-227 


601.1 


: 208-218 




l 
• 


i 






544.3 


170-179 


184-193 


! 224-235 


274-287 


327-336 


352-361 


662.1 ' : 


87_7 i 1 


120.1 








j — 
I 
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Table 4 



ORF 


I Antigenic 


Regions 


Kcont) ! 


! Region 1 1 Region 1 2 


Region 13 


Region 1 4 : Region 1 5 


1 Region 16 


168.6 i 




< i 


238.1 








51.2 j 


i • 
i 




278_3 I 


1 ! 


276.2 ! 


i 1 


45.4 ! 








3 1 6.8 




> 




1 54_1 5 f ; 




! 




228.3 ! 




i 




228.6 






! 




50.1 


i 




1 




112.7 


i 




j 




442.1 


i 




• 




66.2 


i 




i 




304.2 










44.1 






• 




161.4 


1 






46.5 


306-315 








942.1 I i 




j 




5.4 


393-407 ! 416-426 


456-465 


i 




20.4 


396-405 : 410-419 


461-481 






328.2 


i 




i 




520.2 


t 
i 




i 




771.1 


! 








999.1 










853.1 










287.1 








288.2 ! 




_ ; 1 s 

: i 


596.2 I ! I : ! 


217.S I 




1 


217_6 i 




; 1 


528.3 










171_1 1 










63.4 










353.2 










743-.1 








342.4 










69.3 










70.6 


453-471 506-515 i 






129_2 


296-315 








58.S ! j 


188.3 ' ! 1 i 


236_6 


358-377 410-423 


428-439 


442-457 467-476 ! 480-493 


310_8 


238-251 256-275 


281-290 1 


296-310 314-333 


3^8-347 


601_1 ! Ill 


544.3 1 i ■ : 


662.1 1 ! ; 


87_7 










120.1 ; 
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Table 4 



ORF 


Antigenic 


Regions (cont) 








Region 1 7 Region 1 8 


Region 19 Region 20 


Region 21 


Region 22 


168.6 


* 


l 




238.1 I j 


51.2 : ! : : 


278.3 i ! i j ; 


276.2 i III; 


45.4 








1 


316_8 


______ 








154.15 1 


i 




228_3 ■ 


i 
i 




228.6 ! 




i 


50.1 } 








11 2.7 I 


i 






442.1 




1 — 






66.2 


i 


— i — - — 






304.2 










44.1 




i 






161.4 




i 


46.5 j 


i 


r 


942.1 ; 


1 




5.4 ! 




t 


20.4 i 




1 


328.2 j 


( 


t 

i 


520_2 1 


■ 


! 


771 _1 ! i 






999_1 ! 








853.1 1 








287_1 ! 








288.2 ! 








596.2 j 






217_5 






217 6 ! : 




528_3 1 1 




171 11 ! i i 


63 4 ; I 1 ! 


353.2 i III! 


743-1 ! i 


342 4 I I I 


69'3 : ! I ' 


70 6 ! ' ! ! 


129.2 : ! ' 


58_5 i ! 


188.3 ? 


236.6 1 ■ 


310.8 


357-366 370-379 


1429-438 443-452 


.478-487 


551-560 


601.1 ! 


S44^3 : - 


662.1 










8~7_7 
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Table 4 



10 



15 
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30 



35 
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45 



SO 



55 



ORF 


i 


Antigenic 


Regions ! 


(cont) 




i 




i Region 23 


Region 24 < 


Region 25 ! 


Region 26 


: Region 27 




Region 28 


168.6 


t 


■ 


i 




I 


i 


238.1 




■i 








i 


51.2 












J 




278.3 j I 






1 


276_2 ! 1 






1 

H 
-1 




45.4 


i 

1 












31618 " 












i 


154.15 I 












228.3 




* 




228_6 ' 




i 




. 50.1 


i 








1 






1 1 2.7 I 






i 






442,1 1 ; 




i 




66.2 : ! 




1 




304.2 ! 






i 






44.1 i 






i 






161.4 ! 




l 




46.5 ! 








942.1 i 






i 






5.^ : 






! 






20_4 ! ; 




i 




328.2 


i 






1 




520.2 ! 




i 




771.1 i 








999_1 ! 








853.1 1 












287.1 ! 












■ 288_2 ' 












S96 2 ' 1 








217_5 








i 






217_6 






i 










528 3 ; ! 






171 11 ! 






63 4 ; ! i 


i 




353.2 


j 




i 


i 




743_1 






! 






342 4 










70.6 
















129.2 














58.5 














188.3 














236.6 








i 




i 


310.8 


622-632 


670-685 


1708-718 


823-836 


858-867 




:877-886 


601.1 ! ! ! 


544 3 i i 1 ! 


662.1 
















87 7 \ \ ■ \ _ _ 
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Table 4 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



ORF 


Antigenic 


i Regions 


;(cont) 




Region 29 


Region 30 




1 68.6 , 




238_1 


i 


i 


S1.2 


i 

i 




278.3 i i 


276_2 i i I 


45.4 


I ! 




316_8 






154.15 ! 




228.3 I I 




228.6 ! 




50.1 






1 1 2.7 






442.1 ! 






66.2 i 






304.2 






44.1 ! 






161.4 






46.5 




j 


942_1 


! 


5.4 






20.4 


> 






328.2 




520.2 . 




771.1 i 




999.1 


! 


853.1 i 






287_1 ! 




288_2 


• 






596.2 1 




217.5 ! ! 


217.6 i ! i 


528_3 i ! 


171.11 ! ! ! 


63.4 I ! ! 


353.2 - ! I 




743^1 ! 




342_4 1 1 




6913 ! 


70.6 ! 


129.2 ! 


58.5 








188.3 


236.6 ; 


310.8 








601.1 








544.3 








662.1 


87_7 


1 20.1 
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Table 4 



ORF 




BLAST 


Antigenic 


Regions 


r 








HOMOLOG 


Region 1 


Region 2 


Region 3 


Region 4 


46.1 


;S241 


aldehyde dehydrogenase 


8-17 


36-52 


83-96 


1 12-121" 


63_4 


5242 


glycerol ester hydrolase (P. 


9-26 


57-73" ~ " 


" 93-1 07 


123-133" 


174_6 




5243 ketopantoate hydroxymeth 


71-80 


203-212 


242-254 


JJ265-274 


206.16.5244 


ornithine acetyltransferase : 


1-10 


""34-43 


54^63 


. J 94-21 0 ^ 


267.1 


15245 


NaH-antiporter protein (E. 1* 


120-129 


332-347 


398-408 




322.1 


15246 


acriflavin resistance protein 


58-75 


153-164 


203-231 


264-284 


415.2 


;5247 


transport ATP-binding proU 


108-126 


218-227 


298-308 


315-334 


214_3 


"15248 


2-nitropropane dioxygenasc 


123-136 


216-233 


283-292 1 


297-30~6 


587.3 


i5249 


clumping factor 


5-14 


43-54 


59-68 


76-95 


685.1 


;5250 
15251 


signal peptidase 


59-68 


72-81 


86-95 J 


99-108 


54_3 


fibronectin binding protein 1 


23-32 


37-46 


50-59 _ 


89-98 


54_4 


''5252 


fibronectin binding protein 1 


" 43-52 


66-75" " 


95-104 


,...147-156" 


54.5 


[5253 


fibronectin binding protein 1 


49-60 


81-90 




54_6 


jS254 


fibronectin binding protein 1 


55-71 


82-97 


139-1_58 i 17S-186 


328_1 


; 52S5 


lipoprotein (H. flu) 


11-20 


6 W0 


96-105 





Table 4 

2S 



ORF 




Antigenic; Regions 


(cont) t i _ 




Region 5 ; 


Region 6 ! 


Region 7 


Region 6 I 


Region 9 . ! 


Region 10 


46.1 


215-242 i 


333-352 ! 


376-385 


416-432 


471-487 ! 




63_4 


145-154 I 


191-202 i 


212-223 


245-265 


274-283 


291-300 


174.6 




■ ■ i 






1 




206.16 


239-259 1 


275-284 | 




1 






267.1 




i 




j 






322.1 


298-319 


3 50-359 










415-2 


_344 : 353_ 


3 71-380 : 


... 

395-404 


456-465 


486-495 


51*8^52711 


214.3 


318-337 


"365-375 * 










537.3 i 


106-115" 


142-151 I 


156-166 


173-182 


186-198 


204-213 


685.1 ; 


113-122 


130-145 ! 








54_3 ! 


128-138 


185-194 1 


217-226 


251-260 1268-277 


295-305 


54.4 X- 


175-188 


191-200 


203-212 


220-229 






54_5 I 














54.6 i 


220-230 


287-304 


317-326 


344-353 


364-373 


378-387 


328.1 ! 















45 



50 
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Table 4 



ORF 


t 


Antigenic 


Regions 


(cont) 




• 




Region 1 1 


Region 12 


Region 13 


Region 14 


Region 1 5 


Region 17 


46.1 i i 






63.4 


306-315 ; 


319-328 


366-376 


395-420 


453-462 


_ 467-47 6 


174.6 i j 








206.16 ; 1 


; i 


267.1 ! ! ! 


i i 


322.1 










415.2 


539-555 








214.3 


1 


i 


i i 


587_3 


217-226 ! 


278-287 


318-327 


332-342 


351-360 


J 377-386 


685.1 


i 












54.3 


316-325 i 


"329^345" 


~35*f-372"" 


387-396 


416-425 




54.4 












i 


54.5 


1 








T 


54.6 


396-407 i 


427-436 


514-531 


541-550 


569-578 


;612-62~2 


328.1 1 I 


! 



Table 4 



ORF I 




Antigenic 


Regions 


(cont) 








Region 1 8 


Region 19 


Region 20 


Region 21 • 


Region 22 


_ Region_23 


46_1 ! ' 










63.4 


485-500 


513-525 










174.6 i 










206.16 




- 








267.1 * i 










322.1 1 












415.2 












214.3 


! 








587.3 


396-405 


426-442 


459-470 


485-494 


505-514 


;S31-562 


685.1 














54.3 


455-462 


472-491 


.517-536 










54.4 i 


1 


r ! 


54.5 I 








54.6 


1639-648 


673-681 


703-715 


723-732 


749-760 


1772-788 


328.1 ! ' 
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Table 4 



ORF 




Antigenic Regions 


(cont) 








Region 24 \ 


Region 25 


• Region 26 


Region 27 


Region 28 


Region 29 


"46 JT" 


! 




i 








63_4 i 








174„6 ' ! 






■ 


206_16 : i 


■■■■ 






267_1 i i 








322.1 















415_2 














214.3 














587_3 


567-578 


584-601 


i 607-840 


844-854 


858-870 


JB77-886 


685_1 j ! 








54 3 t i 








54.4 














54 5 : 1 ! 








54_6 


: 793-802 ; 


811-826 


834-848 


866-876 


893-903 


907-918 


328 1 1 ! 









Table 4 

25 



ORF 1 Antigenic Regions 


(cont) 


! Region 30 Region 31 




46.1 : 




63.4 


1 74.6 


206.16 ! i 


267.1 i ; ! 


322.1 I ! 




415.2 




214_3 I i 1 


587.3 1889-911 .927-936 




685.1 1 ! 




54.3 ! 1 




54_4 j i 




54.5 1 i 




54.6 :925-944 ^951-997 




328.1 1 t 
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SEQUENCE LISTING 



10 



IS 



20 



30 



3S 



40 



4S 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Human Genome Sciences, Inc. 

(B) STREET: 9410 Key West Avenue 

(C) CITY: Rockville 

(D) STATE: Maryland 

(E) COUNTRY: US 

(F) POSTAL CODE: 20850 

(ii) TITLE OF INVENTION: Staphylococcus aureus Poly- 
nucleotides and Sequences 

(iii) NUMBER OF SEQUENCES: 5255 



(v) COMPUTER READABLE FORM: 
2S (A) MEDIUM TYPE: Diskette, 3.50 inch, 1,4 Mb storage 

(B) COMPUTER: HP Vectra 4 86/33 

(C) OPERATING SYSTEM: MSDOS version 6.2 

(D) SOFTWARE : ASCII Text 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/009,861 

(B) FILING DATE: 05-JAN-1996 



(2) INFORMATION FOR SEQ ID NO:l: 



so 



ss 
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10 



20 



Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 5895 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

TCCATTATGA AGTCACAAGT ACTATAAGCT GCGATGTTAC CAATGTTTTT TAAAATCCCA 60 

GTAATAAAAT CAAAAAATAA GTTAAATAAT GTATTCATTT TAAGTCCTCC TTAATAAAGa 120 

1S aaataGGTAA TAATGTAATA GCTTCTATTA TGATGCCTAA TTGAATGAAT TGGGCAAATG 180 

GCTCTTTGAT GATAAGTGTG ATAATGAAAA GGGTTAAACT AACAATAATC GCATAATATT 240 

TTTTTCGTTT AATAAGTCGC ACAGGAATGG GCTTCTTTTT AGTTGCTGCA GGAGCATATA 300 

CTGAGATTAC ACCTAAAGAA ATAACTGTTA AAATAATCAT AATTAAAAAG TTAATATGAA 360 

AATTTACTAT TACTAAAGGT AAAAGTATAA ATAGTATAAT ACTTTCTACA TAACACCAAA 420 

AAGAAGAAGG TGCATGTGCa CCATGTGCAT GtCTTCTTAT TAAATAAAAT GTTAAATTCG 480 

TAATTAACGT AAACAGAAAA ATGTTTAAAA TATAGGCAAT AGTATACATA ACAATTAATT 540 

TACCTATATT TTTAGCTAAG ACCTGCATCC CTAATCGTAC TTGCAAAAAT TGAATATGAT 600 

CTAAGTTATT TCTCTTTTGA AGATACGTGG CAAACTGGTC AATTTTATTA TCAAAATAAT 660 

TCAATTTTAC ACCACTCTCC TCACTGTCAT TATACGATTT AGTACAATCT TTTAT CATTA 720 

T ATTGC CTAA CTGTAGGAAA TAAATACTTA ACTGTTAAAT GTAATTTGTA TTTAATATTT 780 

35 TAACATAAAA AAATTTACAG TTAAGAATAA AAAACGACTA GTTAAGAAAA ATTGGAAAAT 840 

AAATGCTTTT AGCATGTTTT AATATAACTA GATCACAGAG ATGTGATGGA AAATAGTTGA 900 

TGAGTTGTTT AATTTTAAGA ATTTTTATCT TAATTAAGGA AGGAGTGATT TCAATGGCAC 960 

40 

AAGATATCAT TTCAACAATC GGTGACTTAG TAAAATGGAT TATCGACACA GTGAACAAAT 102 0 

TCACTAAAAA ATAAGATGAA TAATTAATTA CTTTCATTGT AAATTTGTTA TCTT CGTAT A 108 0 

GTACTAAAAG TATGAGTTAT TAAGCCATCC CAACTTAATA ACCATGTAAA ATTAGCAAGT 114 0 

45 

GAGTAACATT TGCTAGTAGA GTTAGTTTCC TTGGACTCAG TGCTATGTAT TTTTCTTAAT 1200 

TATCATTACA GATAATTATT TCTAGCATGT AAGCTATCGT AAACAACATC GATTTATCAT 1260 

50 TATTTGATAA ATAAAATTTT TTTCATAATT AATAACATCC CCAAAAATAG ATTGAAAAAA 1320 

TAACTGTAAA ACATTCCCTT AATAATAAGT ATGGTCGTGA GCCCCTCCCA AGCTCGCGGC 13 80 

CTTTTTTGTA ATGAAGAAGG GATGAGTTAA TCATCATTAT GAGACCCGCC GTTAAAATAT 1440 
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TCATTTGCAA AGGGCGAAAT GGGTTCTTAC TGAGTTATCT ATTATAAAAA AATAAACATA 1560 

GACTTATGAA AAATCTCTCA TAAATCTATG TTTAGTCATG aCATGTGTTA AATATTATTT 1620 

5 

CGGGCGCTTC TTATTTATAC AAATCTAATT TAATACTTTT AAATACAGGT ATATTTTCgC 1680 

GTTGCTGTTC TACTTCATTT AAGTTTAAAT CTACAGTCAA AATATCTGCG GATTCATTTA 1740 

io ATTCTCCAAC TAAATCTCCA TTTGGGTTTA TAACTATCGA ATGACCAGCA TATTCTGTGT 1800 

TACCATCGAA TCCAGTGCTA TTAGTTCCAA TGACAAACAT ATTATTTTCA ATTGCACGTG 1860 

CCTTTAGTAA TGAATGCCAA TGTTGAAGAC GTGACATAGG CCATTGCGCC ACATAAAATG 1920 

75 CAATTTTAGC ACCACTACGA GCAGGATATC TTAATAATTC TGGAAAACGT AAATCATAAC 1980 

AGATAAGTTG GGTCACATAA GTACCGTCAG ACAATTGAAA GGGTTCAGCT ACGTATTCGC 2040 

CAGCGGTTAA AAATTCATGC TCTCTTAACA TAGGAACTAA ATGAACTTTG TCGTATTCaT 2100 

TAATCAGCTG GCCACTTTTA TTCACACTAA AAGCTGTATT AAATATTTGA TTGTTTCTAA 2160 

TGTTAGAAAC TGACCCAGCT ACGATATCGA CTTTATATTT TTCAGCTAAA TGTTTAATAA 2220 

ATGAAAAACT TTGTCCTAGA TTATTATCTG CTTTTTCATT TAAATGCTCT AAATCATAGC 2280 

CATTATTCCA CATTTCAGGT AAAACGACTA CAT CTACTT C AGCATTCATA TTTTTTTCGA 2340 

ACCATTGCGT TATTTGAGTT TCATTTTTAG AACTATCTCC AAAAACAATC GGTAATTGAT 2400 

30 AAATTTGGAC TTTCATAACA TCACATCCTT GATAGATCTT ATATATAACT TACTAAAAGT 2460 

TATGTTGAAA CGCAAAAAAC GAGCACAAGA CATAAAATCA AAGTCCTAGG CTCTACAAAG 2520 

TTATATTGAC AGTAGTTGAT GGGGCCCCAA CATAGAGAAA TTGGAACACC AATTTCTACA 2580 

GACAATGCAA GTTGGGGTGG GCTCTAACAT AAAGAAATAC TTTTTCTTTA GAAATTAGTA 2640 

TTTCTTATAC ATGAGTTTTA CTCATGTATT CCTATTCTTA AGTGCACATT AGCAGCGGCT 2700 

AATGTGTAAG AACTACTACA TAATGAATAA CTAATGATTC TTTATCATTT CTGTCCCATT 2760 

CCTAACAATA TATTGATTAT TTTTTTATTA CGAAACGATC TTCCACTGGA TTAAATGTTT 2820 

TTTCGCCAGC AGCTTCACGA ATATCACCAA ATGGCATTTG AGCAATAAGT TTCCAACTTT 2880 

TAGGAATATT AAATTCATTT GAAGTCATCT CATCAACAAG TGGATTATAG TGTTGTAATG 2 940 

AAG CACCT AT GCCTTTAGTA GCTAATGCAG TCCAAATTGC AAATTGATGC ATGGCATTTG 3000 

TTTGAGTTGA CCATATTGCA AAATTATCAT AGTAGTTTGG CATTTGTTCT TGTAAACCAC 3060 

50 TTACAACATC TTGATCTTCA TAAAACAAAA TTGTACCGTA TGAATGTTTG AAGTTATCAA 3120 

TTTTTTGTTC AGTTGGCTCG AAATCACGAT TCTCTCCCAT GACTTCTTTT AAAATTGCTT 31B0 

TTGTGTTATC CCAAAATTTA TTATTGTTGT CATTTAACAA GAGAACAATT CTAGTTGATT 3240 
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CATCGCTAAT TGATATCGAA TCTTTCAAAT 
TGTCAAAAGT CATTGCTTTT TTATCTTTTT 
5 TAGTAAAGAA TACTTAATAG ACTAAGTATA 

TTACGAAAAT TTCAAGAATA TTAATATTCA 
AACGCATATT TATTATACTT AGATTAATAC 

10 

TGTCATATCA TTGGTTTAAG AAAATGTTAC 
GTAGTTTAGG GCTTGCAACG CACACAGTTG 

15 CAACTACTAA TTTGAATCAT AATATAACTT 
ATGAOACTGG GACACCTCAC GAATCAAATC 
GTCGTGATGC TAATC CTGAT TCGAATAATG 

20 GTACAGATTC AAAACCAGAC CCAAATAACC 
CAGATAACCC GAAACCAAAA CCGGATCGAA 
CGGATCCAAA ACCAGATCCA GATAACCCGA 

25 

ATAAACCAAA GCCAAATCCG GATCCAAAAC 
ATCGAAAACC AGACCCTAAT AAG CCAAATC 

3Q GGGATTCCAA TCATTCTGGT GGCTCGAAAA 
ATGGATCTAA TCAAGGTCAA TGGCAACCAA 
CTGGTAATGA TTTTGTATCC CAACGATTTT 

35 ATCCGTATAT TTTAAATCAA ATTAATAAGT 
AAGACATTTA TAATATTATT CGAAAACAAa 
TACAACAGCA ATCGAATTAC TTTAGATTCC 

40 

ACTATCGTAA TTTAGATGAA CAAGTACTCG 
CAGATTTGAA AAAGCCCGAA GATAAGCCGG 
AAAAAGACGA TTTTACAGTA GTTAAAAAAC 

45 

CATATAGTAA AAGTTGGCTA GCAATTGTAT 
TATTCTTATT TGTAAAG CGA AATAAAAAGA 
50 CCGTGTGTGA TTCGTTTTTT TTATTATGGA 
TCCGTGGCTT TTTTCAAAGC CTCAGGATTA 
TGTAACATAT GGATAATAAT TGGAACAGCA 

55 



TATATATTGA ACGTCTTTCT TCCATTGCAT 3360 

TAAATAAGCC CATAATTATT GCTCCTTCTT 3420 

AAATTTATAC TCGTACTTGT AAAGCAATAT 34 8 0 

TTTTCAAATT CCAAATATAA ATGCATTTTC 354 0 

TTACATGAAA AAGGGAGGTG TCTCGTGAAA 36 00 

TTTCAACAAG TATTTTAATT TTAAGTAGTA 3660 

AAGCAAAGGA TAACTTAAAT GGAGAAAAAC 3720 

CACCATCAGT AAATAGTGAA ATGAATAATA 3780 

AAACGGGTAA TGAAGGAACA GGTTCGAATA 3840 

TGAAGCCAGA CTCAAACAAC CAAAACCCAA 3900 

AAAACTCAAG TCCGAATCCT AAACCAGATC 3 960 

AACCAGACCC AGATAAACCA AAGC CAAAT C 4020 

AACCAAATCC AGATCGAAAA CCAGACCCAG 4080 

CAGATCCAGA TAAACCAAAG CCAAATCCGA 414 0 

CTAACCCGTC ACCAGATCCC GATCAACCTG 4200 

ATGGGGGGAC ATGGAACCCA AATGCTTCAG 4260 

ATGGGAATCA AGGAAACTCA CAAAATCCTA 4320 

TAGCCTTGGC AAATGGGGCT TACAAGTATA 4380 

TGGGCAAAGA TTATGQAGAA GTTACTGATG 444 0 

ATTTCAGCGG AAATGCATAT TTAAATGGAT 4500 

aATATTTCAA TCCATTGAAA TCAGAAAGGT 4 560 

CATTAATTAC TGGTGAAATT GGATCAATGC 462 0 

ATTCAAAACA ACGCTCATTT GAACCGCATG 4680 

AAGAAGATAA TAAGAAAAGT GCGTCAACTG 4740 

GTTCTATGAT GGTGGTATTT TCAATCATGC 4 800 

AAAATAAAAA CGAATCACAG CGACGATAAT 4 860 

ATAAAAATGT GATATATAAA ATTCGCTTGT 4 920 

AGTAATTGGA ATATAACGAC AAATCCGTTT 4 980 

AGCCGTTTTG TCCAAACATA TGCTAATGAA 504 0 
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AATATTAATG AACTTACTGT TGTAGCAATA ATAAATGCCA CGATACGATT ACCTTTAATC 5160 

GCATTAAATA ATTCTCCAAA GATTACTTTT CTGAATACAT ATTCTTCTAA TAAAGGACCA S220 

5 

ATAATAGATA CAAAGAAGAT AAATATAGGT ATTTTTCGAG CAATAATAAT TAG CTTTTCT 5280 

GTATTAGGAC TTACTTGTTG TCCACCATAA ATTTGCGTTA ATACAATGCT CACTACCATT 5340 

TGATAAATCA TTACCAATGC AAATCCAAGC AATGCCCATG GAATGATATA TTTTTTAGGT 5400 

10 

TCTTTAACTT CTAATTCTAA TTTTGTTGG A TTTTTAATTT TTAAATTAAT TAAAATAATC 5460 

GTCGTGGCGG CGATTAAAAA TAGAACAAGT TGTATGTAAA TGACTGCTTT AGTCAGTTCT 5520 

15 ATGCCACTAT ATTGTACAAA TGGTAATTTT TTTACAATGA GAAGCGGTAA AAATTGAGAC 5580 

AATATATAAA TAATAACAGT TAGCAATGAT GCCCATAATC t TGTCATAAT TTTCCTCCAA 5640 

ATATTTGTTT ATAATTTATT TTATCGTAAA TAACTTGAAG TTACAAAACT TAATTAAAAG 5700 

20 

GTTATGACTT GAAATTTTGA CCAAATTTGA TTATTATAAA TGTATGTTAG CACTCTTTAA 5760 

TGTTAAGTGC TAAACTTTAG GTTTTTTAAG GAGGAACAAT CATGCTAAAA CCAATTGGAA 5820 

ATCGTGTGAT TATTGAGAAA AAAGAACAAG AACAAACAAC TAAAAGTGGn ATTGTTTAAC 5880 

25 

TGATAGTGCT AAAGA 5895 
(2) INFORMATION FOR SEQ ID NO; 2: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6796 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

35 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
TTTGAAAAAA CAAGGTACGA TTGGTTTAAT AACATATATG AGAACCGATT CTACACGTAT 60 

40 

TTCaGATACT GCCAAAGTTG AAGCAAAACA GTATATAACT GATAAATACG GTGAATCTTA 120 
CACTTCTAAA CGTAAAGCAT CAGGGAAACA AGGTGACCaA GATGCCCATG AGGCTATTAG 180 

45 ACCTTCAAGT ACTATGCGTA CGCCAGATGA TATGAAGTCA TTTTTGACGA AAGACCAATA 240 
CCGATTATAC AAATTAATTT GGGAACGATT TGTTGCTAGT CAAATGGCTC CAGCAATACT 300 
TGATACAGTC TCATTAGACA TAACACAAGG TGACATTAAA TTTAGAGCGA ATGGTCAAAC 360 

60 AATCAAGTTT AAAGGATTTA TGACACTTTA TGTAGAAACT AAAGATGATA GTGATAGCGA 420 
AAAGGAAAAT AAACTGCCTA AATTAGAGCA AGGTGATAAA GTCACAGCAA CTCAAATTGA 4 80 

ACCAGCTCAA CACTATACAC AACCACCTCC AAGATATACT GAGGCGAGAT TAGTAAAAAC 540 

55 
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AAAGCGTAAC TATGTCAAAT TAGAAAGTAA GCGTTTTGTT CCTACTGAGT TGGGAGAAAT 660 

AGTTCATGAA CAAGTGAAAG AATACTTCCC AGAGATTATT GATGTGGAAT TCACAGTGAA 720 

6 

TATGGAAACG TTACTTGATA AGATTGCAGA AGGCGACATT ACATGGAGGA AAGTAATCGA 780 

CGGTTTCTTT AGTAGCTTTA AACAAGATGT TGAACGTGCT GAAGAAGAGA TGGAAAAGAT 84 0 

10 TGAAATCAAA GATGAGCCAG CCGGTGAAGA CTGTGAAATT TGTGGTTCTC CTATGGTTAT 900 

AAAAATGGGA CGCTATGGTA AGTTCATGGC TTGCTCAAAC TTC CCGGATT GTCGTAATAC 960 

AAAAGCGATA GTTAAGTCTA TTGGTGTTAA ATGTCCAAAA TGTAATGaTG GTGACGTCGT 1020 

15 AGAAAGAAAA TCTAAAAAGA ATCGTGTCTT TTATGGATGT TCGAAATATC CTGAATGCGA 1080 

CTTTATCTCT TGGGATAAGC CGATTGGAAG AGATTGTCCA AAATGTAACC AATATCTTGT 1140 

TGAAAATAAA AAAGGCAAGA CAACACAAGT AATATG TTCA AATTGCGATT ATAAAGAGGC 1200 

20 

AGCGCAGAAA TAATATTTTT ATTTCCTAGA TACATTTTAA GATTGTTAAA TAGAATCATT 1260 

AGTGAATCTT ATTTTAAAGA TAGTAAAGGA TTAATCTAAA TAAGTGCGGA TAATATAAAC 1320 

ATAACAACAT AATTAAmAGA CATAAATGAC aATAAAAGGA GTATAGAAAT GACTCAAACT 13 80 

25 

GTAAATGTAA TAGGTGCTGG TCTTGCCGGT TCAGAAGCGG CATATCAATT AGCTGAAAGA 144 0 

GGAATTAAAG TTAATCTAAT AGAGATGAGA CCTGTTAAAC AAACACCAGC G CACCAT ACT 1500 

30 GATAAATTTG CGGAACTTGT ATGTTCCAAT TCATTACGCG GAAATGCTTT AACTAATGGT 1560 

GTGGGTGTTT TAAAAGAAGA AATGAGAAGA TTGAATTCTA TAATTATTGA AGCGGCTGAT 1620 

AAGGCACGAG TTCCAGCTGG TGGTG CATTA G CAGTTGATA GACACGATTT TTCAGGTTAT 1680 

35 ATTACTGAAA CACTTAAAAA TCATGAAAAT ATCACAGTTA TTAATGAAGA AATTAATGCC 174 0 

ATTCCAGATG GATACACAAT TATCGCAACA GGACCACTTA CTACAGAAAC CCTTGCGCAA 1800 

GAAATAGTGG ACATTACTGG TAAAGATCAA CTTTATTTCT ATGATGCGGC TGCTCCAATT 1860 

40 

ATTGAAAAAG AATCTATTGA TATGGATAAA GTTTACTTAA AGTCCCGTTA TGATAAAGGT 1920 

GAAGCTGCAT ATTTAAACTG TCCTATGACT GAGGATGAAT TTAATCGCTT TTATGATGCA 1980 

45 GTATTAGAAG CTGAAGTTGC GCCTGTAAAT TCATTTGAAA AAGAAAAATA TTTCGAGGGT 2040 

TGTATGCCTT TTGAAGTAAT GGCAGAACGC GGACGCAAGA CATTACTATT TGGACCAATG 2100 

AAACCAGTAG GATTAGAAGA TCCAAAGACT GGGAAACGTC CTTATGCGGT GGTTCAATTA 2160 

50 AGACAAGATG ACGCTGCTGG TACACTCTAC AATATTGTTG GCTTCCAAAC GCATTTAAAA 2220 

TGGGGAGCTC AAAAAGAAGT CATTAAATTA ATTCCAGGCT TAGAAAATGT TGATATTGTT 2280 

AGATATGGTG TG ATG CAT AG AAATACCTTC ATTAATTCAC CGGACGTATT AAACGAGAAA 2340 
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TATGTAGAAA GCGCAgcTAG CGGCTTAGTT GCAGGTATCA ATCTTOCGCA TAAAATATTA 2460 

GGCAAGGGTG AGGTAGTATT TCCGAGAGAA ACAATGATTG GAAGTATGG C TTACTATATT 2520 

5 

TCTCATGCTA AAAACAATAA GAATTTCCAA CCTATGAATG CTAACTTCGG GTTATTACCA 2580 

TCTTTAGAAA CTAGAATTAA AGATAAAAAA GAACGCTATG AAGCACAAGC TAATAGAGCT 2640 

w TTGGATTACT TAGAAAATTT CAAAAAAACT TTATAAAATA GTTAGAAAGA CTAGATATGC 2700 

TATTCATTCT TAAGTCATCA ACGAGTAAGT AATGACTTTC TAAATGGAAA ATACTTATCC 2760 

TAGTCTTTTT AATTTTGGAA TTGTTACGTA TTTCTGACAA TTTAGAATTC GCATTCAAAA 2820 

15 AATATCTAAA TAAATAACAC GCAATAAGTT GATTGATGTA ACATGTAAGA GAATGTTTTA 2880 

AATAAACTTT ATTTAAAAGG CAATGAAATA ATAAATGGCA AGGCTATTAA TAAAGACTTT 2940 

TAGTAATTAA TTTAAAAAAG AGGTATTCTA ATTAACAGGT TTTCCGATTA GTTACAATTA 3000 

20 

TTTAATTCTC AAAAGATTTA GAATTGATTA TCAAATTACT GTAAGCCCTT TGCTGTATAT 3060 

GCTACAATTC TTATTGATGG AGGGTAAATG TATTGAATCA TATTCAAGAT OCGTTTTTAA 3120 

25 ATACATTGAA AGTTGAACGG AATTTTTCGG AACACACATT GAAATCATAT CAAGATGACT 3180 

TAATTCAGTT TAATCAATTT TTAGAACAAG AACATTTAGA GTTGAATACT TTTGAATACA 324 0 

GAGATGCTAG AAATTATTTG AGCTATTTAT ATTCAAATCA TTTGAAAAGA ACATCTGTTT 3300 

30 CTCGTAAAAT CTCAACGTTA AGAACTTTCT ATGAATATTG GATGACGCTT GATGAGAACA 3360 

TTATTAATCC ATTTGTTCAA TTAGTACATC CGAAAAAAGA AAAATATCTT CCGCAATTCT 3420 

TTTACGAAGA AGAAATGGAA GCGTTATTCA AAACTGTAGA AGAGGACACT TCAAAAAATT 3480 

35 

TACGGGATCG AGTTATTCTT GAATTGTTGT ATGCTACAGG CATCCGTGTT TCGGAATTAG 354 0 

TAAATATTAA AAAACAAGAT ATAGATTTTT ACGCGAATGG TGTTACCGTA TTAGGAAAAG 3600 

GGAQ^AAAGA GCGCTTTGTA CCGTTTGGTG CTTATTGTAG ACAAAGCATC GAAAATTATT 3660 

40 

TAGAACATTT CAAACCAATT CAGTCATGCA ATCATGATTT TCTTATTGTA AATATGAAGG 3720 

GTGAAGCAAT CACTGAACGC GGTGTACGAT ATGTTTTAAA TGATATTGTT AAACGAACAG 3780 

45 CAGGCGTAAG TGaGATTCAT CCCCACAAGC TCAGACATAC ATTTGCAACG CATTTATTGA 384 0 

ATCAAGGTGC AGACCTAAGA ACAGTACAAT CGTTATTAGG TCATGTTAAT TTGTCAACAA 3900 

CTGGTAAATA TACACACGTA TCTAACCAAC AATTAAGAAA AGTGTATCTA AATGCACATC 3 960 

SO 

CTCGAGCGAA AAAGGAGAAT GAAACATGAG TAATACAACA TTACATGCAA CAACAATTTA 4020 

TGCTGTAAGA CATAATGGGA AAGCAGCTAT GGCTGGAGAT GGGCAAGTAA CGCTTGGTCA 4080 

ACAAGTCATC ATGAAACAAA CGGCAAGAAA AGTGCGACGT TTATATGAAG GTAAAGTGTT 4140 
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A TT A C* A & f HO 


TTTAGTGGTA ACTTAGAAAG 


AGCTGCTGTT 


GAATTGGCAC 


AAGAATGGCG 


4260 


5 


Artf3fY3ATA a a 


CAATTACGTC 


AATTAGAAGC 


TATGCTAATT 


G T AATGG ATA 


AAGATGCTAT 


4320 






AGTGGAACTG 


GCGAAGTTAT 


TG CTCCAGAT 


GATGACCTTA 


TCGCTATTGG 


4380 




A 1 UAWsAvjCivJ 


AACTACGCAT 


TAAGCGCAGG 


ACGTGCATTG 


AAACGCCATG 


CATCGCATTT 


4440 


10 


GTCTGCTGAA 


GAAATGGCAT 


ATGAGAGCTT 


GAAAGTAGCG 


GCTGATATTT 


GTGTCTTTAC 


4500 




CAACGATAAT 


ATTGTTGTCG 


AAACACTATA 


ATAATCAGAG 


CACGATAAAT 


AATTACGAGC 


4560 




AATTAATTTT 


AGTTAAAAGA 


CGGAGGAATG 


AAATTAATGG 


ATACAGCTGG 


AATAAGATTA 


4620 


15 


ACTCCAAAAG 


AAATCGTATC 


TAAATTAAAT 


GAATACATCG 


TTGGACAAAA 


TGATGCTAAA 


4680 




CGTAAAGTGG 


CAATTGCCCT 


ACGTAATCGA 


TACAGAAGAA 


GTTTATTAGA 


TGAGGAATCA 


4740 


20 


AAGCAAGAAA 


TTTCACCTAA 


AAATATTTTG 


ATGATTGGAC 


CAACCGGCGT 


TGGTAAAACT 


4800 


GAAATTGCAA 


GAAGAATGGC 


CAAAGTTC3TC 


GGCGCGCCAT 


TTATAAAAGT 


AGAAGCTACT 


4860 




AAATTTACTG 


AGGTAGGTTA 


TnT Ann APrt a 


GATGTTGAAA 


GTATGGTTAG 


AGATCTTGTT 


4920 


25 


GATGTTTCAG 


*V A A ftl A TT* 2k f2*P 




AAAAAATCAT 


TGGTACAAGA 


TGAAGCAACA 


4980 




GCTAAGGCCA 


ATGAAAAACT 


x nj x x x x A 


TTAGTTCCAA 


GTATGAAAAA 


GAAAGCGTCT 


5040 




CAAACGAATA 


ATCCTTTAGA 


w X V^rVV. X X X X V* 


GGAGGTGCAA 


TTCCAAATTT 


CGGACAAAAT 


5100 


30 


AACGAAGATG 


AAGAAGAACC 


AC CTACTGAG 


GAAATTAAAA 


CAAAACGTTC 


TGAAATTAAG 


5160 




AGACAGCTAG 


AAGAAGGCAA 


ACTTGAAAAA 


GAAAAGGTAA 


GAATTAAAGT 


CGAACAAGAT 


5220 




CCTGGTGCTT 


TAGGTATGCT 


AGGTACAAAT 


CAAAATCAGC 


AAATGCAAGA 


GATGATGAAT 


5280 


35 


CAATTAATGC 


CTAAAAAGAA 


AGTTGAGCGA 


GAAGTTG CTG 


TTGAGACGGC 


AAGGAAAATC 


5340 




TTAGCTGATA 


GTTATGCGGA 


TGAACTAATT 


GATCAAGAAA 


GCGCTAACCA 


AGAAGCGCTT 


5400 


40 


GAATTAGCAG AACAAATGGG 


TAT CAT CTTT 


ATAGATGAAA 


TCGACAAAGT 


TGCGACGAAT 


5460 


AATCATAATA 


GTGGTCAAGA 


TGTCTCAAGA 


CAAGGTGTTC 


AAAGAGATAT 


TTTACCTATA 


5520 




CTTGAAGGTA 


GCGTTATTCA 


AACCAAATAT 


GGTACTGTGA 


ATACTGAACA 


TATGCTGTTT 


5580 


45 


ATAGGTGCTG 


GAGCTTTCCA 


TGTATCTAAG 


CCGAGTGACT 


TGATACCAGA 


ATTGCAAGGT 


5640 




CGTTTTCCGA 


TTAGAGTTGA 


ACTTGATAGT 


TTATCGGTAG 


AAGATTTTGT 


AAGAATTTTG 


5700 




ACAGAACCAA 


AATTGTCATT 


AATTAAACAA 


TATGAAGCAT 


TGCTTCAAAC 


AGAAGAAGTT 


5760 


60 


ACTGTAAACT 


TTACCGATGA 


AGCAATTACT 


CGCTTAGCTG 


AGATTGCTTA 


TCAAGTAAAT 


5820 




CAAGATACAG 


ACAACATTGG 


TGCACGTCGA 


CTTCATACAA 


TTTTAGAAAA 


GATGCTAGAA 


5880 




GATTTATCAT 


TCGAAGCACC 


AAGTATGCCG 


AATG CAGTTG 


TAGATATTAC 


C CCACAAT AT 


5940 
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AAATATACAA AAGGAGAAAA ATTCATGAGC TTATTATCTA AAACGAGAGA GTTAAACACG 6060 

TTACTTCAAA AACACAAAGG TATTGCGGTT GATTTTAAAG ATGTAGCACA AACGATTAGT 6120 

5 

AGCGTAACTG TAACAAATGT ATTTATTGTA TCGCGTCGAG GTAAAATTTT AGGATCGAGT 6180 

CTAAATGAAT TATTAAAAAG TCAAAGAATT ATTCAAATGT TGGAAGAAAG ACATATTCCA 624 0 

1Q AGTGAATATA CAGAACGATT AATGGAAGTT AAACAAACAG AATCAAATAT TGATATCGAC 63 00 

AATGTATTAA CAGTATTCCC ACCTGAAAAC AGAGAATTAT T CAT AGAT AG TCGTACAACT 6360 

ATCTTCCCAA TTTTAGGTGG AGGGGAAAGA TTAGGTACAT TAGTACTTGG TCnAGTACAT 6420 

1S GATGATTTTA ATGaAAATGA TTTGGTACTA GGTGAATATG CTGCTACAGT TATTGGTATG 6480 

GAAaTCTTAC GTGAGAAGCA TAGTGAAGTA GAAAnAGAAG CGCGCGATAA AGCTGCTATT 6540 

ACAATGGCAA TTAATTCATT ATCTTATTCT GAAAAAGAAG CGATTGAACA TATCTTTGAA 6600 

GAACTTGGCG GTACGGAAGG CCTATTAATC GCATCAAAAG TTGCAGATAG AGTTGGTATT 6660 

ACTAGATCTG TAATTGTAAA TGCACTACGT AAATTAGAAA GTGCTGGTGT AATTGAATCA 6720 

CGTTCTTTAG GAATGAAAGG TACTTTCATT AAAGTTAAAA AAGAAAAATT CTTAGATGAA 6780 

TTAGAAAAAA GTAAAT 6796 
(2) INFORMATION FOR SEQ ID NO: 3: 

30 <i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2073 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

3S 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

ATCCTAAAAT TnAAAATTAT CACGCCTTTT GaACAGCTTT GTAACCaTCt GGACGATCAT 60 

kAAATTCCaA TGTAAATCCT GGTTTAAaGT TGATCTTTAA CCTTATTTAA AyCACCAATT 120 

GTACGTATAT TATGTTGTTT AGCAAAATCA CGTTTTACAG CT AAAG CAT A CGTATTGTTA 180 

TACTTCATTG GTTTTAACAT AGTCATTTGA TATTTCTTTT CAAGACTTTG CTTAGCTTGT 24 0 

TCATAAACTT TTTTCTCTTC TTTTGACTTC AATGGTTCTT TTGTTAATTC ACCTAAAACT 300 

GTTCCAGTAA ATTCTAAATA CCCATCTATA TCGTCAGATT TTAAAGCATT AAATAAAAAT 3 60 

50 GCTGTTTTGC C CAT AC CATC TTTCACTTCT ACAGTATTTT TGGTCTCTTC TTCTATTAAA 420 

ATTTTATACA TATTTGTAAT AATCGATGGC TCGGAGCCAA GCTTTCCAGC TAACGTAATT 4 80 

TTATCACCTT TTTGTGCAAA CATAGGAATA GCGATAGCCA GTATAATAAT CATCACTATA 54 0 
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TCAAATATAA TTGCCAATAA C3GCTGCTGGA ATTGCACCTA ATAATATCAA CGATGCATTG 660 

TTACGGTCTA TACCTAATAA AATTAAATCT CCTAGTCCGC CTGCACCAAT TAATGCTGCT 72 0 

AGTGTTGCTG TACCTATAAT TAATACCATA GCCGTTCTTA CACCAGCCAT TATAACAGGC 780 

ATTGCTATCG GAAGTTCGAC TTTAGTTAAA CGTCTAAATG GTTTCATACC TATACCTTTA 84 0 

GCCGCTTCAA TGAGTGATGG ATCAACTTCT TTAATTCCAG TATACGTATT CCTTAAAATT 900 

GGTAACAACG CATACACTAC AAGTGCAATA ATTGCTGGCA CACGACCGAT ACCAAATAAA 960 

GGAATCATTA AACCTAATAA TGCCAACGAT GGTATGGTTT GAAGAATTGC CGCAATATTC 1020 

ATTACGATTT CAGATATCGT TTTAGTCTTC GTTAATAAAA TACCTAATGG TACCGCAATA 108 0 

GCAGTTGCAA TCAATAATGC GATAAATGAT ATTTGAATAT GTTCTATCAT TGTCGAAAAG 114 0 

AGTTGCCCCT TACGTTCACT CAATATGTCg AAAAAGTTAG TCATGTTGAG CTACCTCCTT 1200 

20 TTTCTGGGAC AAATATTTGA AGATATCTTT CCTATCAATA ACATATTGAC CTACGCTATC 1260 

TTCTTGCATG ACAATGACAC GCTCGCTCTC TGATAAAAGT TGATACAATA CTTCAATTGG 1320 

TTGATTGTCA TAAACAATTG GATAAGCGCT CATAGATGTA ACCTCATCGA TTGGTTTCAT 1380 

25 

AATATCCAAG TCACGGATAA TTGCGTTCTC TTCAACACAT GGCGCATCAT CTTCTAAATG 144 0 

ACTACCCATA AATTGTTTAA CAAATTCACT TTGAGGATTA TTTTTAAATC CTTCTGGTGT 1500 

GTCAATTTGT TCAATATGCC CTTCATTCAA AAGACAAATC TTATCACCAA GTTTCATCGC 1560 

30 

C T C TTGAATA TCATGTGTAA CAAATATGAT TGTCTTCTTA ATTTTAGTTT GTAATTCAAT 1620 

TAAATCATCT TGAAGTTTTT CTCGGCTGAT TGGGTCTAAT GCACTAAACG GTTCATC CAT 16B0 

35 TAAAATAACT GGTGGATCAG CTGCTAACGC ACGTATAACT CCTACACGTT GTCGTTGCCC 174 0 

C CCTGACAAT TCAT CAGGTT TTCTGTTTTT ATATTTTTCA GGTTCTAATC CAACCATTTC 18 00 

AAGTAATTCA TCTACTCTTT TATCTATATC TTTTTCTTTC CACTTTTTCA TTTGTGGCAC 18 60 

TTGTGCAAtA TTTTCTTTGa wTGTCaTATG TGGGAATAAT GCAATCTGCT GcAATACGTA 1920 

TCCAATATCC CAACkCATTT CGTATACTGG ATAATCACTT ATTGGTTTAT CTTTAAAATA 19 80 

AATATAACCT TCACTTAAGT GAATGAGTCG ATTAATCATT TTTAATGTCG TAGTTTTTCC 204 0 

ACAACCTGAA GGTCCAATTA GCACAAAAAA TTC 2073 
(2) INFORMATION FOR SEQ ID NO: 4: 

60 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13321 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

ACTATTCTAG CTTCATCAGT TATCATATAT TCTTTGAAAC ACTTGTAAGA AAATATAATG 60 

AGTATTTACT ACATAATGAT ATTTCAAATT AGAAAAAAGG AAGTTATGAT TTAATGGCCT 120 

TGAGCCTATC ATAACTTCCT TTTATCATTT TATTGTTGTG TTGATGTTTC GATAACGTGG 180 

TACATCTTAT CAAACATCAA TTCGAAACCA TGCACCATGG CATCATGATA TTCTTTTTTC 240 

TTTTGCTTGT ATTCTAAATT AGTAAATCGT CTTTCTTTTT CAACTAATGA ACGATAATAA 300 

AATAGCATTT GGGTGCCACC TGTTTCACGT TCAAAAAATT CTACCTCAAT GACATCTTGC 3 60 

GTTTCACTTA GTCCAGGCAT ACCGATAGTC ATCTTAACGT ATTCATCCAT AACTAAAGAT 420 

TCATAAATGC CTTCAATCAC ATTTACTTTG CCATTACGTT GTTGATCTAC AATACGATAT 480 

TTACCGCCTT CTTTAACGTC CGCTTCAATC TCTTTATTCG TTCTGGCTGA TGTCATAAAC 540 

CATTGTTTCA ACAAATCTTT CTTTGTCCAA GCTTCGTATA CTAACTCTGG AGAAAATTTA 600 

TAAAGCTTTT CAATTTCAAC TTCGACATGT TCATTCTCTA CATTAAATTT TGCCACTGTT 660 

GTCCACCCAC TTTCGCTCTT ACTTTTATTT TAACGTATTT TTGCTCAGTT CCAAACATAG 720 

ATGATCATCA TTTTTAAAAG ATT AG CGTTA TACGGTGAGT ACAACATGAT CTGTTAATAT 7 80 

AACAAGCCAC CTTACTTGGC TACATCGATA TATTGTTAAG CATTAATGTT TCATTTCTTG 84 0 

ACTAGTGTTC TTTTTTAGCT TTGGAAAATT AAATAAAATC GCAATAAGTC CGCATACACC 900 

TAATAATATA GGATAAATGC TGTATGGGAA TAACATTAAC GGTGAAATAC CAGCTACACC 960 

AGCCGCTGaA ATGACTTGCG GGCTATATGG TAATAAACCT TGGAAGCAGC CTCCAAATAT 1020 

ATCAAGAATA CTTGCTGATT TCCTTGAATC TACATCATAT TCATCTGCAA TATTTTTAGC 1080 

TAAAGGACCT GACATAATAA TAGAGATGGT GTTGTTTGCC GTGGCAATAT CTGCGACACT 1140 

TACCAAACTA GCAATTCCTA ATTCTGCGCC ACGCTTTGAT TTCACTTTAG AGCGAACAAA 1200 

TTGCAACAAC CATTCAATAC CACCATTGTG TTGAATAATA CCGACTAAAC CACCAATTAG 1260 

CAACGCAATC AT AG CAAT AT CTTCCATGCT TATAATACCT TTGGACACTG CATCTAGTAG 1320 

CCCCATCCAA CCGAATGAAC CATCTATGAG ACCAATGATT CCGGCTAATA ATGTTCCGCC 1380 

AATCAATACG ATAATGACAT TTACACCTAA TAATGCTAAT ACCAATACTA AGATATACGG 1440 

TACAACTTTA ATTAGATTAT AATCATAGTt TTTAGCATGA TTTAAAGAAA TGCCATTCGT IS 00 

TAAGAAATAC AGAATAATAA TCGTTAAAAT AGCACCTGGC AATACAATTT TAAAGTTTAC 1560 

TCTGAATTTA TCTTTCATTT TCGTATGTTG TGTTCTAACC GCAGCAATTG TTGTATCTGA 1620 

AATCATTGAT AGATTATCGC CGAACATTGC ACCTCCAACA ACTGTAGCCa tTGctAGCGC 16 80 
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TCCTACAGAC GTCCCCATAG ATATAGAAAC AAACATACAA ATCACAAACA ATCCTACAAT 1800 

AATTAAATTT TCTGGGATTA ATGATAGTCC TAAATTAACT GTCGACTTTA CGCCACCCAT I960 

s 

TTTTTCAGCT GTATTTGAAA ATGCACCTGC TAAAATAAAA ATCAACATCA TTAAAACAAT 192 0 

GTTTGAATGG CCTGCACCTT TCGTGAAGAC CTCAACTTTT TTAGCAAATG ATTCTTTTCG 1980 

w ATTCATTAAT AACG CCACAA TTACCGTTAT CGTAATTGCA ACATTTAATG GCATTGAAGT 2 04 0 

AAAATCACCT GTGATAATAC CTACGCCTAA AAACAACGCC ACAAATAATA ACAAGGGGAA 2100 

TAATGCCCAA GCATTGCTCT TTTTATGTAC TTCCATCCTT TTTACCTGCT TTCCAATTAA 2160 

'5 AAATACCTCT TTCTCACAAA CGATGAAGAA AGAGGTTTTC ATGTGCTTTA CCTGCTTATC 2220 

TTCAAACCAT TACGGTTACT GGAATTGGCA CATTCGAGAT GTTGCCGAGG CTTCATAGGG 228 0 

CCAGTCCCTC CACCTCTCTA GATAAGTGAT GCTTATTTAC GTTTACGTTA CAAGATAATC 234 0 

20 

CTTAGTACGT CAATCATAAA TTAATCAGGA GTCGTATAAT ATTTTTCATA AACAATCATT 24 0 0 

GCTACTGTAA TAATAATCAA AACAATAATG CTAATAACAA GTAAAAGCCA CCATTTAAGC 2460 

ATTAATGCAA TAAAAATGAA CACGATAGAC ACACTTACTA ATATTAATGA TATGACTTTA 252 0 

25 

AATTGCTGAA CACGTTGCTT GGAGATGACT TTCAACTGTT TGTTTGATAG ACGCGTATTT 258 0 

TTTATACTGA TTCCCAGTAT ATTTTCTAAT ATTTGAACCA ATACGATACT TATTGCAAAT 264 0 

30 ATAATAATTG GTAAAACATC ATAGCTCCCT ATAGTTAATG TATAAATTAC AAATCCAATG 2700 

TAAAGTAACC CTGAGACAAA GGATAAAAAG TATGCGACGT ATTTGTTAAA CTTAATGATA 276 0 

TGCTTTTTAA CGTTTTGATG TGTAAACCAT ACATTCGAAA CGATCGCAAC TGCTACAAAT 282 0 

35 AATGTGAATA CTATATATAA TGGTAATTTT TGTTCAGGAA AAACAGTCGC TATTCCAAAA 2 880 

GCTAATGCTA AAATCAAAAA TAATATAGCT CTAGATACTA TTAATGCCAT AATAACAACC 294 0 

CCTTTGTTTA ATATCGAGTT TGCAAATTTA CGTTTATCAG CGTTTCTATG ATCAGTACTT 3000 

40 

CTACGGGTAG CGTTTCTATG TAATTTACAT CATCTTAACA TATAAATACT TCGCTATTTA 3060 

ATTGAAAACA TATCCTATTA TTCTTTGTCC GTTCTGACGT TTAATATCTA GCCTTAGGCA 3120 

45 TTTCACTTGT TAATGAATTT AACTTTCTTC CACTAACCGT CCCTAAACCC AATCCCGCAA 3180 

CAGTTTTTAA CTTTTTCGTT GTTGTCCTGA CATCCTCATT AAGAAAGTTT ATTCTG CTTA 324 0 

AAACTTATAA TCCACACCCT GAGCAAACGC TCCTTATGAC AGAGTATTAA AATAAGCCGA 3300 

60 TAAAGATACA CACCTTTACC GACTATTTAA AATACACTTC ACCAATTCAT TTTAATTTAA 3360 

TGGATTGAAG TAACTAAATT- AATATTATGT TGTTCAATTA AAAGCTTCAT ACAAACCTAA 342 0 

TCTATTTGCA CTCCACCGCT AACACCGAAC ACTTGTCCGG TTGTATAACT TGATTCTTCT 34 80 
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GTTTTTTGAC CAAATGTTGG GATTTTACTT TGAGGTTGTC CACCAGAAAT TTGTAATGGT 3 600 

GACCAGAATG GACCAGGCGC TACACAGTTC ACTCTAATTC CTTTTGGTCC TAATTCTTCT 3660 

GAAAAACTTT TAGTTAATGA AATAATTGCT GCTTTTGAAG CGGCATAATC ATGAAGAATA 3720 

GGACTAGGAT TATAACCTTG TACAGATGAT GTCGTTGTAA TTGACGCACC CGGTTTTAAA 3780 

TATTCCAATG CTTTTTGAAC TGTCCAAAAT AGCGGATAGA CATTCGTTTC AAATGTTTCT 384 0 

GTAAATGCCT CAGTTGTAAA TCCATGAATA TCATCATGAT ACTGTTGATG TCCAGCAACT 3900 

AAAGTAACAT TATCTAAGCC ACCTAATTGT TGATATGCTT GTTCAACAAG GTCATAGTTG 396 0 

AACTGTTCAT CTCTTATATC ACCAGGAATT AACACTGCCT TTTGACCACT TTCTTCAATC 4020 

ACTTGGCGTA CTTCTTGTGC ATCTTGTTCT TCACTCGGAA GATAGTTAAT CGCTACATCT 4080 

GCACCTTCTT TAGCATACGC AATTGCTGCT GCACGCCCTA TTGCTGAGTC ACCACCTGTG 414 0 

20 ACTAATATTT TATAGCCTTG TAAGCGTTGA TGACCTTGGT AAGACGTTTC GCCACAATCG 4200 

GGTGCTGGCG TCATTTCAGA TTGTAAACCC GGTACCTCTT GTTCTTGTTT TTCATAATC C 4260 

GTTGTTTTAA ATTTTGTTCT AGGATCTTGA GCTGCCATTT TTTTACATCT CCTTATTCGC 43 20 

TTAATGGTTA TTATTTACCC AATCTTCCTA GGAACTTAAT CATGATTACA CTAAAAATTA 43 80 

CTTTCTTCTT TATAAAAACA AGCTCGAATT ATTCATGCAA TAGTCTCTTT ACAAATTCAA 444 0 

CAAAATACTC AGGTACTTTT TCCAGAATCC TTTCATCCGG TTTATATTGA GGATGATGTA 4500 

AATCATATTC ACTATGAGAA CCAATTAACG CAAATACACT TGGAAAATGT TGACTATAAC 4560 

CTGAAAAATC TTCTCCAATC GTAAGCGGCT GTTCCATCAT TCCCACCTTA TATCCAACAT 4 620 

35 GTTGGGCTAC TGCAATTGCT TTATGCGTCA ATGCCTCATC ATT CATCACA GCGCCAGGTA 4680 

AATG CGTATA ATTTAAATTA ATTTTCATAT TATATGCTTG AGCCAATCCG TCCGCAATAT 4740 

CTTGJAATCG TGTTTCTACA AGCTTTCGTA CCACAGGATC AAAACTACGC ACTGTGCCTT 4800 

GTACATACGC ATGATCAGCA ATGACATTCC AAGTATTACC ACATGATATT TGTCCAATTG 4 860 

TTACTACCGC TTCATCAAAC GCAGATAGAT TTCTACTAAC TATGGATTGA ATACTATTAA 4 920 

TCAATTGCGC CAACACAATA ACTGGATCGT TGCATTGTTC TGGcTTTGCA GCATGACCAC 4 980 

CCACGCCTTT AATATGAAAC TCAAAACGAT CTACTGCTGA TGTAATTGCC C CTGTTTTG A 504 0 

TTGCAAATGT ACCTACCGAA CGCGATGGGT CATTATGAAA AC CCAATACT GCTTGTACAT 5100 

So CTTTTAATGC ATGTGTTTCA ATAATTTTAA AAG CGCCATG TCCTAGTTCT TCTGCTGATT 5160 

GAAAAATGAA TTTAACACGC CCAGTAAGAG TGCCCTCAAT TTCTTTTAAT TTTACAGCTG 5220 

TAGCCAAAAT ACTAG CCATG TGAATATCAT GACCACACGC ATGCATAACA CCTTCATTTT 528 0 
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CAGCTATACA ACTCAGACCT TGTCCCACTT CAGCAACAAG CCCAGTCGCA AGTGGTAAGT 5400 

CTAATATTCT AATATGATGT TCTGTTAAAA TATCTTTAAT TTTTTGTGTA GTCTTAAATT 5460 

5 

CTTTATCGGA TAGTTCTGGA AATTGATGAA AATACCTTCT CCAGGTAACA GCTTGATCTT 5520 

TTAATCCCAT CGGTCATTCC CCTTCCTTAA GTCAATGATA TGTTGTCTAC CCTACGATGA 5580 

w TCATCTTTGA CTATTAAACG ATGATTTCAC AACAATGTAC TCTTGTTAAT TGCTTTCGTT 564 0 

AATGATAGAC AGTTGTTTAA TAATATCGTA ACACTGTTGT CAAACTATTC TAACTTTTAT 5700 

AATTGAGACT CTATACAAAA ACGTGTTCTC GAATATACTT GTTTTTACAA ACCACAAAAA 5760 

15 GCTCTAAACA TTAGTTTAAA CCAATGCTTA GAGCTTTCTA ATTATTTTAT GCTTTAAAAG 5820 

ATACTGTGTT ATCTACGATG ACCTTACCGT CTTTAATAAC TTTTTCTGCG TGATTGATAC 5880 

CAAAATGATA TGGAATATAT TCATGATTTG GTGCATCCCA AATTACTAAA TTAGCCTTAT 5940 

20 

CACCTGTGTT AATTGTACCC GCGTTAATGT CTATTGCTTT AGCAGCATTG ACCGTAACAG 6000 

CATTCCAAAC TTCATTAGGT GATAGCTTTA ATTTCAAGGC TGCAATCGCC ATAACAAGTT 6060 

GTAAGTTGTT TGTGACACTA CTACCAGGGT TATAATCAGT TGCTAATGCA ATCGCACCGT 6120 

25 

TATTGTCAAG CATGCCTCTT GCATCTGCAT AATCTTCTTT ACCTAAATAG AACGTCGTTG 6180 

CAGGTAAGAG GACAGCTACA GTATCACTAT TTCGCAACTT TTCTTTTCCT TTATCACTAG 6240 

30 AAGCTACTAA GTGGTCTGCT GATATTGCTT GTTCATCAAT TGCTAATTCC AGTCCGCCTA 63 00 

ACGGATCAAT TTCATCCGCA TGTATTTTCA CTTTAAAACC TGCTTCTTTG GCTTTTTGCA 63 60 

TATAATGTTG CGATTGTTCT ATTGTAAATA CACCTGTTTC ACAGAAAATA TCCGCAAAGT 6420 

55 CTGCATATTG TTTTACTTCC GGAAGTAACG CAATCATTTC TTCTAAAAAT GCCTCATTTG 6480 

AACTTGCCTC TTTAGGTACA GCATGAGGCC CTAGGAAAGT ATGTTTCATG TCTAAATCAT 6540 

ATTTCTCAGC TAAACGATTA GACACTTTCA ATTG CTTCAG TTCATTTTCT CTATCTAATC 6600 

40 

CATAACCACT CTTACTTTCA ACTGCAAGCA CGCCGTGTTT AAT CAT AGTA AGCAAATCAT 6660 

GCTCTGCTTT TTTAAACAAG TCATCTTCGG ATGTTTCTCT AGTAGCATTA ACGGTAGATA 6720 

45 ATATGCCACC ACCCATTTCT AATATTTCAA GGTAAGACTT ACCTTGACGT TTTAATGACA 6780 

TCTCATGTTC TCGAGATCCA CCAAATGTTA AATGGGTATG TG CATC TACT AATGCTGGGG 6840 

ACACTACCTT CCCACTAGCA TCAATCGTCT CAGTCG CATC GTAGTCATCT GTATGTGTTC 6 900 

60 CAGCATATAC AATTTTGCCA TCTTTAATGA CAACTGTACC ATTTTTCACA ACATTTAATT 6 960 

CATCTAATTC CTTACCCTTC AAAGGTTTAT CTGTTGATCT CGGTAAAATT AATTCTGCTA 7020 

TATGATTAAT TATTAAATCA TTCATTACTT ATCACCTGCT TTATCAATCA TTGGAATATG 7080 
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AACACCCATA CCTGGGTCAG TCGTCAATAC ACGTTCCAAT CTTCTTTCAG CACOCTCTGA 7200 

TCCATCTGCT ACAACAACCA TACCCGCATG AAGTGAATAT CCCATGCCAA CACCGCCACC 7260 

5 

GTGATGGAAT GAAATCCATG AACCACCTGC AGCTGTGTTA ATGAGTGCAT TCAATACAGC 7320 

CCAATCACCA ACCGCGTCAC TACCATCTTT CATACTTTCT GTTTCACGGT TAGGACTAGC 738 0 

AACTGAACCA GCATCTAAAT GGTCTCGTCC AATAACAATT GGTGCTGAAA TTTCACCGTC 744 0 

10 

ACGTACAAGA CGATTTAAAG CTAAGCCCAT TTTCGCTCTT TCTCCATAGC CTAACCAAGC 7500 

AATACGTGAT GGTAGTCCTT GATATGAAAT T TT TTCTTCA GCTAAATCAA GCCATCTTAA 7560 

15 TAACTTTTCA TTTTCTGGGA AAAGTTTGCG CATTTCTTCA TCCGCAOGCT CGATATCTTT 7620 

TGGATCACCA CTCAACGCAG CAAAGCGGAA TGGCCCTTTA CCTTCACAGA ATAATGGTCT 7680 

AATGTAAGCT GGTACAAAGC CTGGGAAGTC AAAAGCATTT TTCACTCCGT TATTGAAGGC 7740 

20 TACTTGACGA ATATTGTTAC CATAATCAAA TGCTACAGCG CCACGTTTTT GGAATTCAAG 7800 

CATTAATTCA ACATGCTTTG CCATTGAAGC TTGTGACAGT TCAACATATT TTTTCGGATC 7860 

TTTTTCACGC AATACTTTCG CTTCTTCTAC AGAGTATCCT TGTGGCACAT ATCCATTTAG 7920 

26 

CGGATCATGT GCACTTGTTT GGTCAGTAAT AATGTCAATT TTAAATCCTT TTTCTAGAAT 7980 

CGCTTGATGG ATGTCTACAG CATTTCCAAC TAACCCGATT GATAATCCTT CTCCACGTTC 804 0 

3Q TTTCGCCTCT TCTGCTAATT TTAATGCTTC ATCTAAATCA GCTGTTTTAA CATCACAGTA 8100 

TTTCGTATCA ATTCGCTTAT CAACACGTGT TTCATCAACA TCCACGCAAA TTGCTACCCC 8160 

ATGATTCATA GTAATTGCTA ACGGTTGCGC ACCACCCATA CCACCTAAAC CTGCTGTCAG 8220 

35 TGTAACAGTG CCTGCTAAAT CTCCATTAAA GTGTTGATTA CCTAGCTCGG CAAATGTCTC 828 0 

ATAAGTACCT TGCACAATAC CTTGAGAACC AATATATATC CAACTACCGG CTGTCATCTG 834 0 

TCCATACATG ATTAAACCTT TTTTATCTAA TTCATTAAAA TGATCCCAGT TTGCCCATTC 84 0 0 

40 

AGGCACTAAT ACTGAATTTG AAATTAATAC ACGTGGCGCT TCTTCATGTG TTTTAAATAC 84 6 0 

AGCAACTGGC TTTCCTGATT GTACTAACAT TGTCTCATCT GATTCTAATT CTOGTAACGT 8520 

TTTCTCTATT GCTTCAAAAG CTTCCCAATT ACGTGCTGCT TTTCCAATAC CACCATAAAC 8580 

45 

AACTAAATCT TCTGGTCTTT CAGCAACTTC TGGGTCTAAA TTGTTGTATA ACATTCTAAG 864 0 

TACTGCTTCT TGTTCCCAAC CTTTACACTC AATACTCAAA CCTTTTTTTG CTTGAATTTT 8700 

SO TCTCATAAAA TTCGCTCCTG TTCTTTTAAG AAGTTAATTC CACTAAATTT AAAACGCTTA 8760 

CATTATTATC TTCAATATTC ATTATAGTAT GTTAAAATAT AGCCAACAAA TATAAATAAA 8820 

CTAATTATCC ATAGCTTGAA TCTATAAATA AAAGGAGCAA AACACATGAA AATTATTCAG 8880 
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CATATTAGCC AGCCATCTTT AACTGCTACG ATTAAAAAAA TGGAAGCAGA TTTAGGTTAT 9000 

GACTTATTTA CACGTTCAAC AAAAGACATC AAGATTACCG AAAAAGGAAT ACAGTTTTAT 9060 

5 

CGTTATGCGA GCGAATTAGT TCAACAATAT CGATCCACGA TGGAAAAAAT GTATGATTTA 9120 

AGCGTTACAT CAGAACCAAG GATAAAAATT GGGACTCTTG AATCTACGAA TCAATGGATT 9180 

10 GCGAATTTAA TTCGAAAGCA CCATTCCGAC TACCCTGAAC AGCAATATCG TTTATATGAA 924 0 

ATACATGATA AACATCAATC TATAGAGCAA TTACTGAATT TTAATATTCA TTT AG CT AT A 9300 

ACAAATGAAA AAATAACCCA CGAAGATATA AGATCCATTC CTTTATATGA GGAAT CTTAC 9360 

IS ATTTTATTAG CACCCAAGGA AACATTTAAA AATCAAAATT GGGTAGATGT TGAAAATTTG 9420 

CCACTCATAT TACCAAACAA AAATTCTCAA GTGCOCAAAC ACTTAGATGA CTATTTTAAT 9480 

AGAAGAAATA TTCGTCCAAA TGTCGTTGTA GAAACAGATC GATTCGAATC AGCAGTTGGA 954 0 

20 

TTTGTTCATC TCGGCTTAGG TTACGCTATC ATTCCGAGAT TTTATTACCA ATCATTTCAC 9600 

ACGTCTAATT TAGAATATAA AAAAATTCGT CCAAACTTAG GCCGAAAAAT TTATATCAAT 9660 

TACCATAAAA AACGCAAACA CTCCGAACAA GTACATACAT TCGTACAACA ATGCCAAGAT 9720 

25 

TATTTATATG GACTTTTAGA GGCTCTTTAA CTTAAGTTAT TAGAGCCTCT TATGCAGTTG 9780 

CTCAGTCAAC TGTATACCTT TTGCCTTTAA CTTAAGTTAT TAGAGCCTCT TATGCAGTTG 9840 

30 CTCAGTCAAC TGTATACCTT TTGCCTTTAA CTTAAGTTAT TAGAGCCTCT TATGCAGTTG 9900 

CTCAGTCAAC TGTATACCTT TTTCCTTTAA CTTAAGTTAT TAGAGCCTCT TATGCAGTTG 9960 

CTCAGTCAAC TGTATACCTT TTGCCTTTAA CTTAAGTTAT TAGTGCCTCT TATGTAGTTG 10020 

35 

CGTAGTCAaC TGTaTACCTT TTGCCTTTAA CTTAAGTTAT TAGAGCCTCT TATGCAGTTG 10080 

CGCAGATCAT CGTATAAAAA TTAATGACGT CATTTCAAAA ATCGATACAA AAATAATTTA 10140 

TTATAAAAAT TCTAAGAAAG AAGTGAAGCA GATGTTAAAA TCTATTAATC ATATATGCTT 10200 

40 

TTCAGTCAGA AATTTAAACG ATTCAATACA TTTTTATAGA GATATTTTAC TTGGGAAATT 10260 

GCTATTGACT GGTAAAAAAA CTGCTTATTT TGAGCTTGCA GGCCTATGGA TTGCTTTAAA 10320 

45 TGAAGAAAAA GATATACCAC GTAATGAAAT TCACTTTTCA TATACACATA TAGCTTTCAC 10380 

TATAGATGAC AGCGAATTTA AATATTGGCA TCAGAGGTTA AAAGATAATA ACGTGAATAT 10440 

TTTAGAAGGA AGAGTTAGAG ATATTAGAGA TAGACAATCA ATTTACTTTA CCGACCCTGA 10500 

60 TGGTCATAAG CTAGAATTAC ATACTGG CAC ACTTGAGAAC AGATTAAATT ATTATAAAGA 10560 

GGCTAAACCA CATATGACAT TTTACAAATA AGGTGTCATT ATAAAAAGGC CTCTTGAACT 10620 

CCGTTAAAAT TTTAATTAAT TATTATATAA TAAGAGAACT TTTCAAACAA TACAGTTGTT 10680 
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TTACTGCAAT TATTTTTCAA ATATATCAAC GTTAATATAA CTTCTATTAA GAAATACTCA 10800 

CATTCTGCCC TGCAATGCAA ATCTCGTCAC ATATAAATAT TTTTAATTAT TTTAAAAAAT 10860 

5 

GATGCACTAA ATTAGCAACG AGCTTAGCAG TTCTATTGTC AGCGTCATAT GTTGGATTCA 10920 

TCTCAGCAAT ACTAACTGAA GACACCTTAT CACTTGGAAT AATACGTTTT GCTAATTCAA 10980 

GAACAGTATG TGGATACAAA CCTAACACTG CCGGCGCACT TACCCCAGGC GCAAACGCAC 11040 

10 

TATCAATGAC ATCCATACAA ATCGTAAACA TAATGACATC ATGTTCATGT ACAAAACGTT 11100 

CAATCATATC TTTAATTGTT GGTGATACGT GACTCAATAA TTCATCTGCA AAGACATAAT 11160 

15 CAATCTTTTT CTCTTTAGCA TAATCAAATA AACTTTGCGT ATTACCACCT TGAGCAATAC 11220 

CAAGCACTAA ATAATCTGTG TTTTCATCTT CTTCTAAAAT TTGTCTAAAG CTCGTTCCAG 11280 

ATGTAGATTG TTGTTCAGCA CGTGTATCAA AATGCGCATC AATATTTATC ACACCAATAG 11340 

20 

ATTGTGTTGG ATAGACTTTA CGTGTTGCTA AATATTGAGC ATACGCAATA TCATGTCCAC 11400 

CACCTAATAA AAATGTTTGT CTATGATTAG CAATTGACTT CGCTGCAAGC ATAGCAAATT 11460 

CTTTTTGAGT ATCAATTAAT TCCTCATGAT CATGATAAAC ATTTCCGTAA TCGACTAAAG 11520 

25 

TTcACATTGA TTCAAATCCG GCAAACCTGC AAATGCTTGT TTAATCGCAT CTGGTCCTTC 11580 

TTTTGCACCA ATGCGCCCCT TGTTTAAAGC AACACCTTTG TCAACAGCAT AGCCTAATAT 11640 

30 ACCGACCCCT GATGGCATAC TACTCTTTTC CAGCTTAGAC AAATCTTCAA ATGTTACTGT 11700 

TTGAAAATGT CTAAATTTTT TCGGGTCTGT TTCACTATCT AACCTTCCAG TCCATAAATT 1176 0 

TGGTTCACCT TGCTTGTACA CAGCATTTCC CCCTCTTATT TATGTGGCTT ATTAACAATT 11820 

55 AAAGTATAAC GTATAGGAAA TTTTGAATTC AATTCATAGT TAAATCCGTA TCTTAAAAAT 1188 0 

ACTTATCTAC ATTACTTTTA CCCCTATTTT CTATGTAATA AOGAATACTT AGCTGATTTA 11940 

TGTTAATAAA ATACGTCAAG ACTATTACAT TTTCATTAAT ATTGACATAG ACAATTTATC 12000 

40 

TCTCGGCTTG TAATATGTAT AATTGTTACT AAAAGATATT TTGCTTGTTA CCTAATGGAG 12060 

GTTACATATA ATGAAGAACA ATAAAATTTC TGGTTTTCAA TGGGCAATGA CGATTTTCGT 12120 

4S CTTCTTTGTC ATTACAATGG CGTTATCCAT TATGCTCAGA GATTTCCAGT CTATAATTGG 12180 

TGTCAAACAC TTTATATTTG AAGTTACAGA TCTAGCACCA TTAATTGCTG CAATCATTTG 1224 0- 

TATACTCGTT TTCAAATATA AAAAGGTCCA ACTTGCAGGT TTAAAATTCT CAATCAGCCT 12300 

60 GAAAGTAATT GAACGTCTAT TGCTAGCTTT AATTTTACCT TTAATTATTC TAATTATTGG 12360 

TATCTACAGC TTTAATACAT TTGCAGATAG CTTTATTTTA TTACAATCAA CAGGCTTATC 12420 

AGTACCTATT ACACACATTC TGATTGGACA TATTCTGATG GCGTTCGTAG TAGAATTCGG 12480 
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is 



30 



35 



40 



45 



50 



5$ 



ACTTATGGTA 
CTTGOTGAAT 
GCTTCAATGA 
ATCAAAGTCA 
ATTATCCGAG 
AATTATTTAG 
TCTTCAAATA 
GCTAAAAATG 
AAAACAGAAC 
GACATCACTT 
gATTTaACAA 
TATAAAGAAG 
GAAGAnCAAT 



TGTTGTTGGT TTGATGTATT CAGTTTTCTC AGCAAATACA 
TGCTTATAAC TTCCTTTATA CATTCTCATT CTCTATGATT 
GACTAAAGGA CGTACAATTT ATATTGCAAC GACATTCCAT 
TATTTTCTTG TTTAGCGAAG AAATCGGCGA TCTATTTTCA 
AACAGCAATC GTTGCAGTAG GATACATTGG TTTAAGCTTA 
TTTAACAACA AGACGAAACC TTGAAGAACT TGAGCCTAAT 
TGACGATGAA GAAACTAATC ATACTGAGGC TGAAAAATCT 
TGAAAAAACA GGTGTAGCTA CTGCATCAAC GGTTGGTGTT 
TACAGTGGCT GACGAACCAA GCATTCATGA AGGTACTGAA 
CATAGGTAAT CAAACTGAAT CTAATCATGA TGAAGATCAt 
AGAATCAGCm GaATCAGTTA AACAAGCACC ACmAAGTGAC 
TGAAGATGAA ATAGAGCAAT CATTAnAAGA ACCTGCGACT 
ATCAGTTGTA ATTGATGCAG AAAAACATAT CGAAAAAGCT 
A 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 54 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
ATGTGTTGTA AACTTTTATG TTGAAAAAGC TACTTATCTC AATGAAAACA 
AATAAATTAA TTAGTATACA GCTAGTTTTT CTAATTGTTC TTTAACTTGA 
ACCGTATTAG AGAGGCAGAT TGATCCATCG TTTGAATTGC TTGTCCTTCA 
AGCCATTACA AACAACTTCA AACTGTTGTG CCATTTGATC AAGACGCGCA 
TGTTTAAAAT AAACATATCG TCATAATGTG ATGGCGAATA GATAATTCGT 
AAACGTATAA AAACCTTGTC ATATCAACGG TTTTGGCATT TTTAAACCTC 
ACGCATGTTT GCCCTTATTT AAATAATTTG CCCTTTTTTC GCCCCGAAAA 
AAAAATAACC ACACTCCTAA ATTAATAGGT GGTGTGGTTT TGTTGATTGT 
AAATAACCGC ATTATTAAAG ATACGGTTAC TCTGTTATCT GTAAATATAA 



CAGAATTTGC 
TAATTAGAGC 
CATTCGGACT 
TCGCCATTTC 
GTATTGCATA 
ACCATGTCAA 
TTAAAGATGC 
ATACTGAAAA 
CTCAACATCA 
CGGAGTCAGT 
ACGATTCAAA 
ACAGACGTnC 
CTTCAGATAA 



AGTAGCATTT 
ATTAAGTTTG 
TTTTCGTTCA 
TGAGCTTGTG 
CGTTGTATAC 
TGTGTTTTCC 
AAAAACACAA 
AGGGGTATAA 
TAGTAGTTTA 



12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
13200 
13260 
13320 
13321 



60 
120 
180 
240 
300 
360 
420 
480 
540 
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AAACAGGACT CCACATAAAA ATCAACTCCT TTATATACCA TAATGATACT ATATTTTCTA 660 

GTTTATTTCA ATTTTTCAGT TTTTAAAAAT GAGTTTCTGT TTTTATTTAT ACGCTTTTCT 720 

5 

GTTTTCTTTT TAAATTTTAT CTTTTTGTTA TTCCATTCAT TGTAAAATTC TATTAAATTA 7 BO 

ACATAAAATT TTTCATGCCC TATTTTATTT GTTGATGAGA TATCAATGTA AAGACTCAAT 840 

ATTGTTTTTA AATAGATTTG ATGCAACGAC TGATAAACCG TATTACTATC TGCTATGTTA 900 

10 

TTGGTAAAAT GCATAGAAAA ATATTCTAAT TTATTCATGC AATATATATG GGTTTCATTA 960 

TACTTCTTAA TGAGTGTATT TATACCTTGC AATACGTCAT TACTTTTAAT AACAATTTCT 1020 

15 TTTTCACCTG TCGAAAAAGT CCACTGTTTA TCTCCTATAT TTTCTTTAAT TGTTTTCTTG 1080 

TTGTCAAATT CTAAAATTAT AG CCCGTAAA CACTCTTCTT TATAATTCTC GTTCTTGAAA 1140 

GTACGAAGCA AAATTTTTAT AAATTCGGTA TTGGTGACTT TTTTATAAGT GTGATATTTT 1200 

20 

GCAATCTCTT TATCAGTAAA GACTGTTCTT AGTTCGTGAT TATCAAAACT TAAATTCATC 1260 

TTATTCTCTA ATTCATTAAT TTTATCTTGC AAACCAACAT TTTCTAAAAT TTTCTTGTTT 1320 

ATCTCCCCTA TATCAAAACT CCTTTTCGAA ATTAATTTTG AAAACTCGTC TGCCATTTCA 1380 

25 

ACAGCCTTTT CTTTCCTTTT ATACCTTTTG TTAAATTTAT GAACCACCGT TGCAGCATAA 144 0 

TACGATATCC CACCAGATAA AATAGATGaT ATTATCGGTA TGTATATATC ACCTTTCATA 1500 

30 TTTCCACCTC TTTTAACACA ATTAAGTATT ATGATACACA ACTTGCGCAA AAAGATGTAG 1560 

ACAGAACATA ATGGCGAACA AAAACAACCA CCCAGTAACT AGTATGGGTG GCGTAgACTA 1620 

TAACAACTCT ATGTTATCAA GATATATGTA TCGAGTGATG GCAAGGAAGA AGTCTCCTGC 1680 

35 GGGACCAACA GTCAGATATA TGGCCTCTGC CGGGCTATAT AGTTCACTCC TACTATATAA 1740 

AAGTAAGTAT AACATAAAAA GCACCCCGTA AACTGTTATA CGGGAATGCT AAAGTCATAT 1800 

ATACTACGGG GAGTAGTATG AAAACTATGC TCTCTATCGT AAGAAAAAAC ACCCAGTGAC 1860 

40 

ATGCTTGGGT GAACAAGGAT AGATGTAAAT AGTTGATGCA TGTGTAcACA TCATAACAAA 192 0 

AAACTAGCCC GAAGcTAGCT ATAACATAAA AAAATAGGCA AGTACCGAAG TACCTGCCAG 1980 

45 TTACGCACAT TTAAATCTTG AGAGTAATGT TAAAAAGTGT ATAGGAATAT TAACATCCAT 204 0 

CCAAATAGTT ATTTAATAAC TGTAAGATTC CCTATAATTA ATGTAGCaAA ATTTTTATTC 2100 

TAAGTAAATA CTAAATCGTG CTAAACTTAC CAAAACTACT TATTCTATTA CCTGCCTTGT 2160 

60 CTACCTCTCC TGTCGCTATA TAACGACGTT GTCCACTATT AGCAATATAA GTAATCCATC 2220 

TATAGCCATT GATGCAATAT GCGCCGTCAT ATTTAATTGT TGCGTTATTA GGTAATACAC 2280 

CTGTAATTCT TGAATTAGTT GAATAGCCGT CCCTTACGTT ATTACCTTTA ACATTGGCAA 234 0 
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CTGGCACTGG 


TGGATTTTTT 


TGGTTTTTAG 


CTGATGTTTT 


AACATTACCA 


GCTACCAAAC 


2460 




CACCTATAGG 


CTTACCATGA 


ATCGCACCGG 


CTATTAATTT 


AGAATACAAG 


TCATAGTTTT 


2520 


5 


TCTTAATCCA 


ATCCATATCA 


TTTTTATTAG 


TAATAAAACC 


TAATTCAGAT 


AAACGATAGT 


2580 




TTATATTTAT 


TTCTGCTGAT 


ACATTAACGT 


TTAGTAAATC 


ATTACGAGGT 


GTTACACCTC 


2640 


10 


TTATTTGTCC 


TAAGTTATTT 


TTAATAACAT 


CTTGTATACT 


TTTATCAATA 


GTATCTGCAT 


2700 




TGAATTGACT 


TGAAATAATA 


ACATGCCCAC 


CACTTGCACT 


TTCTCCTGCT 


GCGTCTAAAT 


2760 




GAATCTCTAG 


AACAATGTCA 


TACCCATGTG 


ATTTAACCCA 


ATATAAGCCA 


T AAT CTTT AT 


2820 


15 


TATTTCCTAC 


ATTAACACCG 


TAAGCAGTAT 


CTTGATACAT 


ATCTTGTGAT 


TGACTTGAGC 


2880 




CACCATATAA 


TGCAACTTCG 


TGACCTGCAT 


GTCTTAAATA 


CTTAGCGATA 


TTTGGTGTTA 


2940 




TATATTTACG 
AACCATGACC 


GATAAAATCA 


CGTTCATTTG 


TTCCGTTTCC 


GACTGCTCCA 


GGATCGTTAT 


3000 




GGCTACAAGC 


ataaittttt 


TAGGTTTAAT 


TACTGCTTGC 


TTTTTGGCAG 


3060 




TTGCTTGCTT 


AATAACGCTT 


TTAGCTTTAT 


CTCCAACACT 


TACTTTATCT 


GGGAAATTTA 


3120 


25 


ATCTAATAAA 


ATACATTGGG 


TCATCGTAAT 


AATGAACATG 


TCTTGTAACG 


GTTTCGGGAC 


3180 


CCCAACCAGG 


TTGCGCAACG 


CCATTTGTCC 


AACCTTTACC 


ATTCCAATTT 


TGGCCAAACG 


3240 




ATGTGAAAGT 


GTTTAGATTA 


GCGCTCTCAA 


CAATTTCAAC 


ATGTCCaGct 


CCGCCACCAT 


3300 


30 


ACTTTGACGG 


GAAAACGACA 


ATGTCCAACT 


TTTGCGGTAA 


AAAGCTATCA 


TAGTTTTTAA 


3360 




TTATTTGCCC 


GTATTTTTCA 


ATCCTTGCTT 


TATTATCAAA 


TGGAATATTA 


TAAGCGTATA 


3420 




AACCTTGTAA 


CcTTTCGCCT 


GTTGCTATCA 


TAAAAAACAT 


ATTTGCGTAA 


TCGTAACACT 


3480 


35 


GAAATGGATA 


AAACAAATCA 


GGATTGAACT 


GCTTCCCTAA 


TGAATTATCA 


AACCATTTTT 


3540 




CTGCTTGGTT 


TTTTGTTATC 


AACATTGGTC 


AACACCTACC 


CTAAATCATT 


TGTGTCGTTC 


3600 


40 


ATA'FTCGTAG 


GTGTCATTAC 


TTCTTTAATT 


GGCGCTTGCC 


CTGTTGCTTT 


TCTATACTTG 


3660 


TTTTCAGCTT 


TATATTTCTT 


TAGCTTTTGA 


TTTGCCCATT 


TACCTTCTTG 


AGATGTTGGA 


3720 




TTATCTTTAT 


ATGTAGTATA 


TAAAGCAACA 


ACTGTTAAGA 


TAATCGATGA 


AACACTTTCT 


3780 


45 


TCATCTACTG 


GTATCGGACT 


TATACCTTTA 


TTCGCTAAAA 


ACTGATTGAC 


TAATGCTAAG 


3840 




ATCAATACGA 


TGTATCTTGT 


TATTACTTTT 


GCATCCATTT 


GTTTGCTCCT 


TTTAT CCAAA 


3900 




AT AAAAAG C C 


AGTGCCGAAG 


CACTGACTCT 


TAACTATTAC 


TTACACTTAC 


TAAACCAGAA 


3960 


50 


ACACGACCAA 


AAGCTATATC 


CTAAAATTCC 


CTTAAGCATG 


GTAATCACCT 


CCTTTAAATG 


4020 




CCAAAAATAG 


TTTTTAACAA 


GGCTATAACA 


AATGTACTTA 


GAATCGTCCC 


TATTAATCCT 


4080 




AGAATCCACA 


TCTTGATGTC 


TCTAATATTT 


TTAGCATTTT 


TCTCTTTATT 


TTTTTCATCT 


4140 
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TGCGTTCTCA GACTGTCTTC TATTCTGTCG AATTTTTCAA ACATAGTCTT ATCATTTTCT 4260 

TCTAATCGCG TTAAACGCCA ATCTTGTTCG TGTCGTTTGG TAAATCCAAA CATTACACCA 4320 

5 

CCCACTTTAT TCAAATTAAA AAGCCATAAG ATTATAACCT ATGACTCTAG ATTTTCTGGA 4380 

TACTTTTCTC CTGTAATAAT TGCATATTCC TCTTTATCTA TAACTTCCAT ATCTACATAC 444 0 

CACGCTATAT CTTCTTTACT ATATT CTTTC AATTGATACC ATGTTTTAAT ATCTTCGAAT 4 500 

10 

GTTGGTGAAA TTAATTTAAG CATTTTCAGT CTCTCCTTTA ACCTCTTCTA ATTTTTTATT 4 560 

AAGTGTCACA AGTTGTTTTG CCATTAGTGC ATTTTGCTTA TTAACTTGCA TCGATAACTT 4620 

15 TGTACTTTGA ACAACTTGTT TCTGCATACT AGCAACCATT TTTCGTAAGA TGTCATCAGA 4680 

AGCGACTGTG TTTTGTTCTT CACTGTCAAT CTGTTGATGC AAGTCATCTT TTTCTTCTGA 4740 

ATAATCTTCG TTAAAAACTA TTTCCCCATT TGAATATTTA AAGGCTTTAG GTCTAAAAAC 4 800 

20 TTGAGAGAAA TTTTCTGGTA AATTTTCAAT ATCAATACCT TCTTCAAAGC CACCAATGAT 4 860 

AG CGTATGAA ATTATCTCAT TACGCTTGTT AACTAATATT TGCATTATTT TCTCACTCCT 4920 

ATAATTTTGT TAATTGTCCC TCTATTTGCG TTCGCACCAG AGCCTCTTTG ACTTCOTAAG 4 980 

25 

TCGAAATAGA CATCGTTTGA TATAGTTAAA GATGTACGAC TAGATTTAGT TAATCCAAAC 504 0 

TCATAAACAC CTCCACCATT TCCATCACCA TCTGGAAGAT TTGAGGGATT CAATGAAATC 5100 

30 TTTCCTCCTC CAAAAGGACT GCCAAACTCT GTAAAGTCAC CACCTGGAAA AGTCCCATAA 5160 

AAAATTAATA AAATAAATTG GTCTAAACTC TCATTTAAGT ACAATGTAGA GCCCACACCA 5220 

TTTGCTGTTC CATCAAAAAT AACCGAATAC CTTTTATTAA ACTTGTCATC TGCGTATAAT 52 80 

55 TTAG CGTTAC TTTCGGCCAT ATTAGCTTTT GATTGGGCAC TTTGAACAGT TTCAAAAGGT 534 0 

GTATTGTAAT CATTAATAGC TAATTCTGAC CACTCAGACC ATGAACCCGC TTCTTTTCTT 54 00 

TTAACAAATA CTTTATTTGT ACCGTTCGGT CGATAAGTCA TACGCTTGTA ATCTGAAGTT 54 60 

40 

ACTACTAAAT ATTCGACAGT ACCGTTAGTA CTAACACCTC TTGGATAATT TATAGCTTGC 5520 

GAAACATAAA TAAATTGGGT TGAATCACCT ATTCTTTGTT CTGGATTATT AAAATCAAAT 5580 

CCAGTAATCT GCATTATCTT ACCATCATCT TTAGTAATCT TAGCTTTTTG CCAATTTGAA 564 0 

45 

GTAGAACCAC TTGTGACTAA ACCACCACTA TTCACTGACT GCTTGAAGGC TTCATGTTTC 5700 

TCATCCATAT ATCGCTTTTG CTCATCGAAT GTTCTTGAAT ATGCTTGCGC TTTATTTTCC 5760 

50 AAATCAGATA TATGGCTATT AGCAAGTTGC TTTAATTCAT CTATACTTGA AGATTTTGCT 5820 

ATTTGAATAT CTGATAGACC TTTTTCTTTA GCTTTTTCAA TCAGACTCGC ATAATCTTCA 5880 

CCATTTTTTA TAGCCTCGTC CATTG CTTTC GCACGATCCA TAATAGTTTT TTCTAATTCC 594 0 

55 
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TCAACGTTAA ATGTGATAGT TCTCTCGACA ACTACCACGT CTGAATTACC TAATTCTGCA 6060 

ACCGAAACTT GAGCTTGATA ACTTCCATCT CGTTTAATTA CATCATTAGG TAATTGAAAT 6120 

TTTAAAATAC CTTTAAATGG ATCTAATATT TCTAGTGGAG CAACTACCAT GACTCCTTTA 6180 

CCTCGAATCG CTATTCGTGC kTTGATATTT tCTTCACTCA ATAATAACGG TTGATTATTT 6240 

TTAGTGATAT TAAAAAGAAG AACAGAAGAA TCACTCTCTC CTGTTCTAAA AGTTATATCT 6300 

AGATTTGAAA TATTTCCATA ATGCGCTGTG TTTTCTAAAT TTATAGCTAC AGATTTCTCT 6360 

AAATTACTCA TTAACTTATA ATTCTCCCTT CGTGTAAAGT CCATGGCCCT GAACTTGTTT 6420 

1$ TACTATCATA ATTTTTCAAT AGTATCTCAG CAGATGCTGT AACACTATTA COAACTAGCC 64 80 

TATGAACAAA GCCACCTGTG TTTGAAGCTT CTACATATAA GTTCCAACCA GCTACCCCTT 6540 

TACGTTCAGT TGGAAAATCT GTAAAACGTT TTGTATCATC CGTAGTTAAA TAAAACGACA 6600 

TGCCTACTAT GTTAATATCT GACATTTTTG TGATGAATGA AGGTACTCTC TCCCATTTAC 6660 

CACTATTTTT AGGCACATAA TTCCAGTCCG AAATGTCTCC AGTTCTTCCA GAAAGCACCC 6720 

TTTCAAAAGT CATCATATTC CTTGCATAAC TATTACGCGT CAATATCTGA ATTACATCAC 6780 

CGCCAGTTTG TGGTGGCTTA ACTTCCAAGA ACCAACCTGC ATCACG CCAT TCTCTTGGTA 6840 

ATGGGAAATC ATCGATTTGA ACTGTATGAT CAGTGTATAA ATAGTAAAGA CCTGGCTCTG 6900 

TTAACATCCC AAGATTCTTA AGTTTATCAG GCCTCATTGG TAAAGGTTTA ACTCTACCAC 6 960 

CTGTGTCACT CaTGATAAAA GGAACGCCTC TTGAGTGAAG TATTTCTAAA AT AC CTCTTT 7020 

GCCCAATGAT GAAAATACGA TGTGTTCTAT TTCCaTCACC ACCGACAGTA ACACCTAGCA 7080 

35 TCAAAGCTTT TTTACCACTA TCTTTGTCAT AGTATATTTG CAAACCTTtC TgCTTCCGCA 7140 

AATTCGC CAG GAAATGAATC tAgTGTTCCA CCATAGTCAG CATTAACCTG ATACGCTTCT 7200 

TCTCCTGTTT CTAAATCGAA AGCCGTTAAA TAGTTTCTAT TATTTGGATT ACTGTCTCCT 7260 

GTATACCAAT ACAAGTATTT TTCATCAAAA GTCACACCCT GCATTGGTTG GGTTTCGTTT 7320 

GTTAGTCTCA TAGGGATACT GATTTTATGC AAAACTTTAT CAATATTTTT ATCAACATCG 7380 

TCTAAACTTC TTATCTCTAT ATAAnTCATT GAGTTTTCAA GTTCCCACTG ACTTCTAGGT 7440 

CTCTCaATTC TGTATAGAAT TTTATTTTCT TTTTCATTTA TGACAGGGGT GATGTAGGGT 7500 

TTTTCTGGGT GTCCTGTAAA TACATCTTGC ATACCATACT TGCCATAGCT AATTTCCACA 7560 

50 TTAGGCGTAT ACTTGAAACG AACTAATGTA TTCTCATTAT TACCATTTAA GATAAAACTA 7620 

TAAATCCATA ACTCATcATC AATATATCTA TAACCGTTAT GTGTACCATG ACCCCCACCT 7680 

ACAATCAATG AGCTGTCTAT AAATTGACCA TTAGGTCTTA GACGACTTAG CATATAGCCA 774 0 

55 
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30 



40 



45 
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20 



ATTACTGCAT 


TTGTAAgAGG 


TGCAAGTTCT 


GTCACAAATA 


AAAATTCTTG 


CTTATCAGGT 


7860 


TCAAAACGAT 


ACTCGATATC 


AAGAATTTCT 


TGTTTGGTCT 


TATTTAATTC 


TCTTATAGTT 


7920 


TCCTCTTTAT 


TAATTTGAGT 


TTTGGTTTCC 


CAATCGTCTA 


AATGTTCTTT 


TAATGTGTCA 


7980 


AAGGTTTCGC 


CGTTTACATT 


AACTCGAGCT 


TGAACAATCT 


CATTAGCACT 


GTTATTACGT 


8040 


GGTGCCACAA 


CAAGTGCGTT 


AATTTGACTT 


TGTAAAGATT 


TGTTTACTGC 


TGCTTGCGAT 


8100 


CTACCATTAT 


AATAAATTTG 


CTCAGCGAAG 


TGTTGAATTG TTTTAGCTyT CTGATGCAAC 


8160 


TTAAACTCTG 


TTGTCAAGCC 


AAGCGCAAAT 


TGCTCTATTC 


TTTGTAAGTT 


TTGTATTTCC 


8220 


TTAGCTCTAT 


AATCTCGACC 


TGCTAAAGCT 


CCCAAATCCT 


TTATTAAATA 


CAAATTTTCC 


8280 


ATAATGCACC 


TTCCTTTCTA 


ATAAAATAGC 


ACTGTACCAA 


GTTTCCCACT 


ATCGTCAACT 


8340 


GTTATTTTCC 


ACAATTTACC 


GTTTGGGGAT 


TTCTGTACAA TGCTATTTTG AATAATTgcC 


8400 


TGctTCGCCT 


ATTTTTAAAT 


TATCTAATTT 


ATTTkTATCA 


TTTACCGAAA 


TGATACCGTC 


8460 


TTGAGGCAAT 


CCATCAATAn 


CACTACTGCC 


TGCATAAGGT 


ATCCCATTTA 


TAG CTTTCCA 


8520 


ATGTGTAGCT 


GGAAAGTACT 


GTTTATCGT 








8549 



25 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3601 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



^ (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

AGGCGTGTAG TGACTTACGG nTAGGAAACT 
AGGCATTAAA GTCCATTGAA ATATCnGGTA 

40 

GATGTATGAG TCAACCATTA TTCAGAGAGG 

GGTGAAAATA ATGACAGAAA TTCAAAAACC 
45 AGAAAGTGAT TTTACCAAAG CAGAATTCGA 

AGAGTATAAG AAAAACGGTA TTAAGCATCA 

ATTCGAAAAG AATTCGACGA GAACGCGTGC 
50 TGCGCATCCA GAATTTTTAG GAAAAAATGA 

GGATACTGCG AAAGTATTAG GTAGAATGTT 

ACAAGCTGTT GAAGATTTAG CGAAGTTCTC 

65 



ATGTATCCGA ATGATTTATT 
GCGmGTTGGT ACgTGGACGT 
ACATTTAACG TAATAAATTA 
GTATGATTTA AAAGGCAGAT 
AGGACTTATT GATTTTGCAA 
CTACTTATCT GGAAAAAATA 
TGCGTTTACA GTTGCGTCTA 
TATTCAATTA GGCAAAAAAG 
CGATGGTATT GAATTCCGTG 
TGGTGTACCG GTGTGGAATG 



GAGACCAAAA 60 

GGGGGCCCTA 120 

TAGAmACGAG 180 

CATTATTAAA 240 

TTACATTAAA 300 

TTGCACTACT 360 

TTGATTTAGG 420 

AATCTGTAGA 480 

GTTTTTCACA 54 0 

GATTAACAGA 600 
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TCTAGAAGGA ATAAACTTAA CTTACGTTGG AGATGGACGT AATAATATTG COCATTCATT 720 

AATGGTAGCA GGTG CTATGT TAGGTGTTAA TGTAAGAATT TGTACACCTA AATCATTAAA 780 

TCCAAAAGAG GCATATGTTG ATATTGcAAA rGAAAAaGCG AGTCAaTATG GTGGTyCAGT 84 0 

CATGATTACG GATAATATTG CAGArcCAGT TGAAAaTwCm GATGCTATAT ATmCAGATGT 900 

TTGGGTATCG ATGGGTGAAG AAAGTGAATT TGAACAcGTA TTAATTTATT AAAAGACTAT 960 

CAAGTGAATC AACAGATGTT TGATTTAACA GGTAAAGATT CAACGATATT CTTACATTGT 1020 

TTACCAGCAT TCCATGATAC AAATACACTT TATGGACAAG AAATTTATGA AAAATATGGA X080 

1S TTAGCTGAAA TGGAAGTTAC AGACCAAATC TTTAGAAGTG AACATTCAAA AGTGTTTGAT 1140 

CAAGCTGAAA ATAGAATGCA TACAATTAAG GCAGTAATGG CAGCAACATT GGGGAGTTAA 1200 

TCACTAAATG GAACGATATG AATATGATGT GTCTGATGAT ATAAGTGTCA TGTACAGACA 1260 

CCTCATATTG GTATTAAAGG AGAAATGAAT ATGAACGAAT CAGGAGATAA CAAACTCAGT 1320 

AAATCTTCTT TAATTGGACT AGTTATAGGA TCCATGATTG GTGGCGGTGC GTTCAATATA 13 BO 

ATGTCTGATA TGGGCGGTAA AGCCGGTGGA TT AGC CATT A TTATTGGTTG GATTATTACA 1440 

GCTATAGGAA TGATTTCATT AGCGTTCGTA TTTCAAAATT TAACCAATGA ACGGCCGGAG 1500 

CTAGACGGTG GTATTTATAG TTATGmTCAA GCAGGATTTG GCGATTTTGT AGGATTTATC 1560 

AGTGttiTTGGG GATATTGGTT CTCAGCGTTT TTAGGCAATG TTGCCTATGC AACACTATTG 1620 

ATGTCAGCAG TAGGTAACTT TTTCCCGATT TTTAAAGGAG GCAACACATT ACCAAGTGTT 1680 

ATTGTCGCCT CGTTACTACT CTGGGGTGTC CATTTCTTGA TTTTAAAAGG CGTTGAAACA 1740 

GCAG CATTTA TCAATAGTAT TGTTACTGTT GCAAAGTTAA TACCGATTTT ACTTGTAATC 1800 

ATATGCATGA TAATTGCATT CAATTTTGAC ACTTTTAAAA CAGGCTTTTT CAGTATGACG 1860 

TCAGAGGGTG TATTGCCATT TAGTTGGGCG AGCACAATGA GCCaaGTtAA AAGTACGrTG 1920 

CTAG TGACAG TTTGGGTGTT TATCGGTATC GAAGGTGCAG TAATTTTTTC TAGTAGAGCT 198 0 

nAAAATGAGA AAGATGTAGG TAGTGCCA CG GTTATAGGAC TTATATCAGT TTTAATTATC 2040 

4& TATyTCTTAT TAACTGTATT AGCTCAAGGC GTGATTTTGC AAAATCATAT TTCGCAATTA 2100 

GATTCGCCAA GTATGGCACA GGTGCTTGCA ACTATTGTAG GTGGTTGGGG ATCTACACTT 2160 

GTAAATATTG GTTTAATTAT TTCGGTACTA GGTGCATGGT TAGGATGGAC ACTGCTTGCT 2220 

60 GGTGAATTAC CTTTCATTGT TGCAAAAGAT GGATTATTTC CAAAATGGTT TGCTAAAGAA 22 80 

AATAAAAATG GAGCACCTGT AAATGCACTG CTTATT AC CA ATATATTAGT ACAATTATTT 234 0 

TTAATAAGTA TGCTATTTAC ACAGAGTGCG TATCAATTTG CATTTT C ACT AGCATCAAGT 24 00 
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CGACAGCAAG CAACTACTAA ACAATGGACG ATTGGTATCA TAGCCTCAAT TTATGCTATA 2520 

TGG CTT AT AT ATGCAGCAGG TATCAATTAC TTATTATTGA CGATGTTACT TTATATTCCA 2580 

5 

GCTCTTCTTG TTTATACaAT CGkTCmAAAG rATwATCAGa CACGTTTGAT TAAATCAGrC 2640 

TATATTCtTT TTATGATTAT tATCGTACTT GCAGTTATCG GGTTAATTAA GTTATTGATG 2700 

r0 GGAACGATAA ATGTTTTTTA AAAGGAGOGA CAAAAATATG AAAGAGAAAA TTGTCATTGC 2760 

ATTAGGCGGT AATGCGATAC AGACAACAGA AGCAACAGCT GAAGCACAAC AAACAGCTAT 2820 

TAGATGTGCG ATGCAAAACC TTAAACCTTT ATTTGATTCA CCAGCGCGTA TTGTCATTTC 2880 

15 ACATGGTAAT GGTCCACAAA TTGGAAGTTT ATTAATCCAA CAAGCTAAAT CGAACAGTGA 2940 

CACAACGCCG GCAATGCCAT TGGATACTTG TGGTGCAATG TCACAGGGTA TGATAGGCTA 3000 

TTGGTTGGAA ACTGAAATCA ATCGCATTTT AACTGAAATG AATAGTGATA GAACTGTAGG 3060 

20 

CACAATCGTT ACACGTGTGG AAGTAGATAA AGATGATCCA CGATTTGATa ACCCAACTAA 3120 

AcCAaTTGGT CCTTTTTATA CGAAAGAAGA AGTTGAAGAA TTACAAAAAG AACAGCCAGA 3180 

2S CTCAGTCTTT aAAGAAGATG CAGGACGTGG TTATAGAAAA GTAGTTGcGT CACCACTACC 3240 

TCaATCTATA CTAGAACACC AGTTAATTCG AACTTTAGCA GACGGTAAAA ATATTGTCAT 3300 

TGCATGCGGT GGTGGCGGTA TTCCAGTTAT AAAAAAAGAA AATACCTATG AAGGTGTTGA 3360 

30 AGCGGTTATA GATAAAGATT TTGCTAGTGA GAAATTAGCA ACGCTGATTG AAGCAGATAC 34 20 

CTTAATGATT CTTACGAATG TAGAAAATGT ATTTATTAAC TTTAATGAAC CTAATCAACA 34 8 0 

ACAAATCGAT GATATTGATG TAGCAACACT GAAAAAAtAC GCGGCACAAG GTAAGTTTGT 354 0 

35 

GGAAGGATCG tGTTGCCAAA AATAGAAGCT GCGtACgtTT GTTGAaAGtG GGGaAACCAA 3600 

A 3601 

(2) -Information for seq id no: it 

40 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : double 
45 (D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
60 CGACACTATT AAATGAATTA GAGCACAATC TAACAAATCA AATTCATTTT TCAAAAGATG 6 0 

AACGACTCAC ACATATCGCT TTAAAGTTAT TCGAAACAAC CGATCCTGTT TCAACAAAGC 12 0 

AACTTGCGCA AGATGTTAAT GTTTCGCGTC GGACAATTGC AGATGATATT AAAATGATTC 180 
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TTATTGGTGA GGAAGATCAT TATCGTAAAG CGTATGCACA CTTTATACAT CAATATATGA 300 

AACAAGCTGC AC CTTTT AT A GAGGCGG ATA TCTTTAATTC AGAATCAATC GCATTGGTTC 360 

GCCGTGCCAT TATTAAGACA TTAAATAGTG AAAATTATCA TTTAGTTCAG TCGGCTATCG 420 

ATGGCTTAAT CTATCATATA CTCATTGCCA TTCAGCGTTT AAATGAAAAT TTTTCGTTCG 480 

ATATACCTAT CAATGAAATT GATAAATGGC GACATACTAA TCAGTATGCn ATTG CTTCAA 540 

AAATGATAGA AAACTTAGAA CGCAGTGTAA TGT 573 
(2) INFORMATION FOR SEQ ID NO: 8: 

1S (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 1221 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

20 

<Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

TTGATATTTA TAACGTTATA TTTTAATAGT TCACCTGGAT TATTAAATAA ATAGTCCGCC 60 

AAATTTTCTT TTTCTTTATC AATCTGaTkG TAATTAACaC TTTCGaCTTC TGTAGGAATT 120 

CTAATGTCAA CAGAAGCATT GATATAAGCT TGATGTTGCA TGCAATCACA CTCCTAATCC 180 

TTCATmTmAA ACGGAGAAGT AAACCCGTCA CTATT CAAAT TCAATCCTTT TGCCCAATCA 240 

ACAGGCTTAT TCATGATAGT TTCGATTTCC TTAAGTC CAT TTGAACCTCT AGGTATTTCT 3 00 

ACAATTACTT CATCATGGAC ATGG CCAACT ATTTTAAAAC. CTAATGCTTC AAGC CTTGCT 360 

35 AT AGAAAT CG CAAGTAAATC CCTTGCAGTT GCTTGAACAA TATTCTCGAC TAACTTCCCA 4 20 

CCATACGTTT TTAACTTTGA CCATTTACGG TTAAGATCTA ACCCCATAAA TTCAACAACT 4 80 

TGAC^ACCCC AACTATTTTC ACCAACTAAA GCTTTTGGAT AAGCTAAAGC TCTTCCACTA 54 0 

GGCAGTTCAA TCATTAGAAA ACCTTTTTTC ATATAAAATC TAAGTCCATG TGTATGATGC 600 

GTCTTTCGGG ATTTTACAGT ATTAATTGCA GCCTCTTGGC AAGCCTTCCA AAAATTAACT 660 

ATGTTAGGAT TTGCGTTACG CCAACTATCA ACTAAACCTT GTAACTCGTT TTCTTCAATG 720 

CCCATTTCCA ATGCACCCAT TG CTTTT AAA GCTCCAGCGC CACCTTGATA GCCTAAAGCT 780 

AATTCGGACA CTTTT CCTTT TTGTCTGAGA GGGTCGCCTT TAGTTATGCT TTCTACCGGT 840 

50 ACATTAAACA TTTGAGAAGC CGATGCTTCA TATATCTTTC CGTGTGTGTT GAATACATCT 900 

AAACGCCATT GTTCTTTTGC ATACCATGCT ATGACTCTTG CCTCTATTGC AGAAAAATCA 960 

CTTACTGCTA GTTCATTACC TTCTTCAGCA GTAAATGTCG TCCTAACTAA TTGACTTAAT 1020 
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AGATCTCTTG CTATTTCTAA TTCAGTATCT GAAATATAAT GCTTTGTTAA ATTCTGAAGT 1140 

TOTACACCTC TACCTGCCCA TCTTCCAGTA CCGGCACCGT AAAATTGAAA CAGACCTCTT 1200 

ACCCGTTCAT CACTGCACAT C 1221 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 10 90 base pairs 
(B) TYPE: nucleic acid 

<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

TTTTGTTTGG TATGAGGTAG CAATGACGAC GTGTCATTGG TGGAGATTGT AAAAATACAT 60 

AATAAAAAGA AGCGGCAATG TATACCGCTC CTTTTTTATA CTACATACCG ATTTTCAACC 120 

ATCTCTTTCT ACTTAGTAAT AAGACAATAG TATTAACTAT AAATAGAAOA ACGAAGAATG 180 

ATACTATATT TATAATTTCA GTAGGACACA TAAATGTTGA CTCGTTATTC AATATTTTTT 240 

CTACGGCACG ATACATCGTA TTGCTCGCCT CAAATGGAGC AACGATACCA AATATATTTT 3 00 

TATTAATGGC AACTAAGATG ACTGAACCAA TCCAATATAC AATGCTGATA CCTAAGCTGA 360 

TTAAAATGTT AGGTGAAACC ATACTAATCG TTCCAACAAC TAAGATATAT TGTAAGATAA 420 

CGAGTGAAAA TAAGATTATT AATAGTAAGT AATGTGAGAA ATCCGAATAT ATAATTGAAA 4 80 

TAATAGTGAT ACTTAGAATT ATGAACACTA AACATTCAAA AAATAACACT GCTACCTTTT 54 0 

35 TATAGAAGAA GGTAAAGATA TTATCGCCAA TCAATTTATA AAACAGGATA TTTTTATTCG 600 

AATACT C TTT ATTAATAAAA TATGCAATAA CAAATGAAAA TAGTAAGAAC CCTAATTGCG 660 

TTGCAACAGT ATATGAACTG AAGAAAAACT GGCTATAGCT TAAACTTTTA ACTTTGTCTA 720 

40 

TACCTATTGG TAAAAAATAC CCAAGTAAGA AAAGGAATGT GAATAGCACA ACAAG CGTGT 730 

AAATAATTTT ATTGGAAATA CTTTTTTTAA ATTCTAATTT CAAAGTGGAC ACCTCAATTA 840 

TAAATTAATG TAATCATTTA TGACTTCTTC TTTTGATTGG TACTCTTCTA TTTGAAGGTC 900 

45 

TTTAAAAATA AAGTATTTAC CCGGCAAAGC ACTTAAATCG GATAAATTaT GTGTAATATT 960 

GATAATAGTT TTAGTTTGAT GGCTTTGAAT AAAATCATTT AAAAATTCAT AAATTTCATT 1020 

SO AACTGTTTTC TTGTCTAAAG CGTTTGTAAC TTCATCTAAT ATGATTAAAT CATGATCTTC 1080 

CAATAAGAAA 1090 
(2) INFORMATION FOR SEQ ID NO: 10: 
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(A) LENGTH: 904 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

10 TTAGGACTAT TTTATCATAT TCATTTAAAT TACGGCTAAA AATTTTAAAA ACGGGGATTA 60 

ATATATGGAA TTAAGCTATG AAAGTTAATT GATACTTGCA TTTTACGCTG ATTTATATAA 120 

GAATAACTAT TGTATAGTTT TAAAAACGAA CGTACGTTTG CAGGAGGCGA AATCATTGGC 180 

AATOAATAAA CAAAATAATT ATTCAGATGA TTCAATACAG GTTTTAGAGG GGTTAGAAGC 24 0 

AGTTCGTAAA AGACCTGGTA TGTATATTGG ATCAACTGAT AAACGGGGAT TACATCATCT 300 

AGTATATGAA ATTGTCGATA ACTCCGTCGA TGAAGTATTG AATGGTTACG GTAACGAAAT 360 

AGATGTAACA ATTAATAAAG ATGGTAGTAT TTCTATAGAA GATAATGGAC GTGGTATGCC 420 

AACAGGTATA CATAAATCAG GTAAACCGAC AGTCGAAGTT ATCTTTACTG TTTTACATGC 480 

AGGAGGTAAA TTTGGACAAG GCGGCTATAA AACTTCAGGT GGTCTTCACG GTGTTGGTGC 540 

TTCAGTTGTA AATGCATTGA GTGAATGGCT TGAAGTTGAA ATCCATCGAG ATGGTAATAT 600 

ATATCATCAA AGTTTTAAAA ACGGTGGTTC GCCATCTTCT GGTTTAGTGA AAAAAGGTAA 660 

AACTAAGAAA ACAGGTACCA AAGTAACATT TAAACCTGAT GACACAATTT TTAAAGCATC 720 

TACATCATTT AATTTTGATG TTTTAAGTGA ACGACTACAA GAGTCTGCGT TCTTATTGAA 780 

AAATTTAAAA ATAACGCTTA ATGATTTACG CnwGGgTAAA GAGCGTCAAG AGCATTACCA 84 0 

TTATGAAGAA GGGAtCaAAG rGTTgTTAGT atGTCCAaTG ArGGAAAAGA AGTTTTGCCT 900 

GACG 904 
<2) INFORMATION FOR SEQ ID NO: 11: 

40 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11271 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 

45 (D) TOPOLOGY: linear 



30 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GATTTCTAAA TCAAGATCTG TTTTACGATA ACCATTCAAA CCTTGACGTT CATCTTCTTC 60 

AGGTTGATTT TGTTG CTGTG TGTCTTTGTT GTCAGAAGTC GCTACTGTTT TTTTATTATC 120 

TGTTTCTTTA GTCATAACAA ACGCCTCCGT TATAAAACGC TATATTTAAT GATATGTGAT 180 
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TTAATAAGAC GATTCAGCAA GTTTTAAAGT ATTATTTGAC TATGTTGGAT TAGGCATCTA 300 

GTCCTATAAT ATCACTGACA TTGTCAAAAT GATGATCTTT TAAGTAACGT GCGATGCCTT 3 60 

TGTTCATTTT CTTAGTTAAA CCTGGGCCTT CAATAACAAG TGATGAATAA ATTTGAATAA 420 

GTOACGCACC GTGACGCATC ATTTTGATTG CATCTTCAGT ACTGAATACG CCGCCTGTAC 480 

CTATAATTAA AAATTCACCA TTTGTTTGCT GATAAgCATa CTTAATCAAT TTTAAATTAC 54 0 

GTTCAAATAA TGGACGACCA CTCAAACCGC CTTCTTCGAC TTTATTAGCA GAAGTTAAAC 600 

CATCTCGTTG TCGCGTTGTG TTTGCTAAGA TGATACCGTC AAATGTCTCA GTAATCGCTG 660 

1S GTAATAGTGC TTTTAAGCCA TCGAAATCCA TATCAGACGT TAGTTTTAAA TAAATTGGCA 720 

CTGTTACATC ATGTTGTTTT TTAAATGCTG TTAAAGCTTG GCATAACATT GAAAATTCAT 780 

CTTTATCATG GAAGTTTTGA AGATTTTCAG TATTTGGAGA ACTGATGTTG ACTGTGAAAA 840 

20 ATGAAACGTC GTGTTTAAAC GTATCAATAA CCTTTATATA ATCTTGATAA CGCGCTTCAT 900 

AAGGTGTCAT TTTATTCACA CCAAGATTGA TACCAACAGG TACTTGATAA GCATTTTTAC 960 

GCAAATGACT TAGTGCTTTG TTCATACCAA TATTATTGAA GCCCATTCGA TTTATCAAGG 1020 

25 

CGTCATCTTC TAATAATCTA AACATGCGTG GTTGAGGGTT ACCCGGTTGA GGTTTAGGTG 1080 

TGATACCACC TAATTCTAAA GCACCGAATC CAAGGTGTTC CAATGCTTTT GGTACTTCGC 114 0 

AAGATTTGTC GAAACCAGCT GCTAAgCCAA TTGGATTGTC GTACGTATTA CCTTGTATCG 1200 

30 

TTTGTGATAA CGTTGGATTC TTATAAGTAA ATAGTTTATC GACGACTGGG AATAAAACCG 126 0 

GaAACTTTTG TaACGTTTTT AATGCATCGA TAGTTAGTCC GTGTGCTTTT TCGGGTTCGA 132 0 

3S TTTTGAATAA GAAAGGTTTA ATTAATTTGT ACATGAGTAT GCTCCTATTT CATTATATTT 13 8 0 

GAGGCTTACT ATCCTCAACT TAATATATGT GAAATATATT CTTTTAATAG ACTAGCATTT 144 0 

CCATACATAA TTTCCTAGTT AAAACTAAAA AGTTTTGAAA ATTGACGCAA gTTTGAATAA 1500 

CGTTTTT A AG ATTAAATCAT CCTAATTAGG CAATATTATA GTATAAAGTA AGTAGATTGG 156 0 

AAGGTGTTTG TATGAATGAA CAATGGTTAG AGCATTTACC TTTAAAAGAT ATTAAAGAGA 1620 

TTTCACCAGT GAGTGGTGGT GATGTAAACG AAGCATATCG AGTCGAAACA GATACGGATA 1680 

CATTTTTCTT ACTTGTCCAA CGTGGACGTA AAGAATCATT TTATGCTGCA GAAATTGCAG 1740 

GTTTAAATGA ATTTGAACGT GCAGGTATCA CGG CACCT AG AGTAATTGCA AGTGGCGAGG 1800 

TTAACGGTGA TGCGTATTTA GTGATGACGT ATTTAGAAGA AGGGGCTTCA GGGAGTCAAC 18 60 

GCCAATTAGG GCAACTCGTA GCTCAATTAC ACAGTCAGCA ACAAGAAGAA GGCAAATTTG 1920 

GCTTCTCATT ACCTTATGAA GGTGGCGATA TTTCTTTTGA TAATCATTGG CAAGACGATT 1980 
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GGCTATGGGA TGCCAACGAT AT CAAAGTAT ATGACAAAGT GCGACGTCAA ATTGTGGCGG 2100 

AATTAGAAAA GCATCAAAGT AAACCGTCTT TATTACATGG TGACCTATGG GGTGGTAATT 2150 

5 

ATATGTTCTT ACAAGATGGT CGTCCGGCGT TATTTGATCC AGCGCCATTA TATGGTGACA 2220 

GAGAATTCGA TATCGGTATT ACAACGGTAT TTGGTGGTTT TACGAGCGAA TTTTATGATG 228 0 

CGTATAATAA ACATTATCCA CTCGCAAAAG GTGCATCCTA TAGACTTGAA TTTTATCGTT 234 0 

10 

TATATTTATT GATGGTCCAT TTATTGAAAT TTGGTGAGAT GTACCGTGAT AGTGTTGCGC 2400 

ATTCTATGGA TAAGATTTTA CAAGATACAA CAAGTTAGTT AAGACGTTAG ATTGAGATAA 24 SO 

75 ATAGATAATA TGCACAOATA TTTTTACAAT GAGAAGCGAT ACAGCTGCCT CAATAAAAAT 2520 

ATTTGTGCGT TTTTATTGTT GGAAAATAAA ATTTTAATCG CTATTGTTAA TTTCTGTAAT 2580 

GTAAAACAAG GTTGAGTTAC AATAAAAGTG ATTTTATAAC TTTTTGTTCA ATAAAATTCT 2640 

20 AGGAATGATA CATATTTATT GATACAATAA TTTTGAATAT AATCATAAAA CAATATTTAA 2700 

GTATAATTGA ATGTTTGAAT ATCATATATT GATACAGTTT CTAATAATTT TAAAATAATT 2760 

TAAATGGAGA GAGGTGTAAA TGATGAGTAC AGTTCAAAGT GATATTTTTA AGACCAATAG 2820 

25 

TGCATCATCA TCTATTAAAA GCGCTGTTGA AACATGTAAT AATGTGTCGA AACCGGATAA 288 0 

AGATGAAAGT ACAACAGTAA GTGGAAATAA TAATGCTCAT AGTGTGATAG ATGATTTGAT 2 94 0 

3Q GAGTAAGAAT CAATCTGTTG CTGAAGCAAT ACGAACTGCG AGCGATAATA TACAAAAAGT 3 000 

TGGTGAGGCT TTTGACCAAA CTGACGTAAT GATTGGTAAT GAAATTGGTA AAAATTAAAA 306 0 

CGTGGTGAAA TGATGTCGAA TAAACTGGAT GAAATCAATA AAATAATCAC AGCGAAACAT 3120 

3S GAGCAAATGG ATGACTTATA TGATGAAAAG CGAGAGGTTA AAGCATTGAT AGATGAAAGT 3180 

GATGCGCTTA AT CATTCG AT AGATCAATTA TATCAACATT TAGGTGAGCG TTATTATAGT 324 0 

AGCAATATGG CTAGTCGTAT GGAACAGTTC CGCGATGAAT TTCATTTTGC GAAACGACGT 33 00 

40 TCAACGGAAG CGTTATACGA GCAGCAACAG CAAATTCAAC ATGGCATTCG TAAAGTGGAA 336 0 

GAAGAGATGA TTGACTTGGA AATGCGAAGG AATGTTGAAA TTGAGACGGT GACAAAGGAG 3420 

GAAAATAAAT GGAAACAATA GGAAGCATTA TTTATTTAAA AGAAGGTTCG CAAAAGTTAA 34 8 0 

45 

TGATTATTAA TAGAGGmCCA aTTGTAGAAA TTGAAAATCA AAAGTATATG TTTGACTATT 354 0 

CTGCATGTAA ATATCCGATT GGTGTTGTAG AAGATGAAAT TTATTATTTT AACGAGGAAA 360 0 

SO ATATAGATTC AGTTATTTTT AAAGGTTATT CTGATCAAGA TGAGGTTAGA TTTCAAGAGT 3 66 0 

TGTTTGAAAA TATGAAACAA AATTTGGATA GTGAAATACA ACGTGGAGAA GTTACACAAC 3 72 0 

AATAAAGAAA TACTTTTTCT TTATTGGGGT GGGACGACGA AATAAATTTT GTAAAAATAT 3780 
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ATGTCATTCA 


TAATCATTTG 


AACTAAACGT 


AGCAGCCTTA 


AATTTTAAAA 


AAAGACACAT 


3900 


ACCAACTTCC 


GAAATGTAGA 


TGAATTCTCT 


ACAATAACGG 


AAGTTTTTCT 


TTTAATATTG 


3960 


AAATTTCTCA 


AGGATAGGTC 


TATACTTTAT 


AAATCGTAAT 


TATTACGATT 


TATAATCAAA 


4020 


AACAATAACT 


TGAAATAGAT 


CATTGAGGGA 


GTGTTAATAT 


GCAACATCAT 


AAAGTGGCTA 


4080 


TTATcGGTGC 


CGGTGCTGCA 


GGTATAGGTA 


TGGCCATTAC 


CTTAAAAGAT 


TTCGGTATAA 


4140 


CAGATGTCAT 


TATTTTAGAA 


AAAGGAACAG 


TAGGACATTC ATTTAAACAT 


TGGCCGAAAT 


4200 


CGACCCGTAC 


GATCACGCCA 


TCATTTACGT 


CTAATGGATT 


TGGCATGCCT 


GATATGAATG 


4260 


CAATTTCCAT 


GGATACTTCA 


CCAGCATTTA 


CATTTAATGA 


AGAACATATT 


TCCGGAGAAA 


4320 


CATATGCTGA 


ATATTTACAA 


GTGGTTGCCA 


ACCATTACGA 


GCTGAATATC 


TTTGAAAATA 


4380 


CAGTTGTCAC 


AAATATATCT 


GTAGATGATG 


CATATTATAC 


GATTGCAACG 


ACAACAGAGA 


4440 


TATATCACGC 


GGATTATATC 


TTTGTCGCAA 


CAGGTGATTA 


TAATTTCCCT 


AAAAAgCCAT 


4500 


TTAAATATGG 


TATTCATTAT 


AGTGAAATTG 


AAGACTTTGA 


TAACTTTAAT 


AAGGGGCaAT 


4560 


ATGTGGTTAT 


CGGAGGTAAT 


GAAAGTGGCT 


TTGATGCTGC 


ATATCAACTT 


GCAAAAAATG 


4620 


GCTCTGACAT 


CGCACTTTAT 


ACT AG CACAA 


CCGGTTTAAA 


TGATCCGGAT 


GCTGATCCTA 


4680 


GTGTTAGATT 


GTCAC CTTAT 


ACACGTCAGC 


GACTAGGTAA 


TGTCATTAAG 


CAAGGTGCTC 


4740 


GCATCGAAAT 


GAATGTACAT 


TATACAGTTA 


AAGATATTGA 


TTTTAACAAT 


GGACAGTATC 


4800 


ATATCAGTTT 


TGATAGCGGA 


CAAAGTGTGC 


TTACACCTCA 


TGAACCAATA 


CTAG CAACTG 


4860 


GCTTTGATGC 


AACAAAAAAT 


CCAAT CGTTC 


AACAATTATT 


TGTGACAACA 


AATCAAGATA 


4920 


TTAAATTAAC 


AACACATGAT 


GAATCGACAC 


GTTATGCGAA 


TATTTTTATG 


ATTGGTGCAA 


4980 


CAGTTGAAAA 


TGATAATGCC 


AAATTATGCT 


ATATCTATAA 


ATTTAGAGCG 


CGATTTGCAG 


5040 


TAC&GCACA 


TCTTTTAACA 


CAGCGGGAAG 


GcTTACCAGC 


TAAACAAGAT 


GTCATTGAAA 


5100 


ATTATCAAAA 


AAATCAAATG 


TATTTAGATG 


ATTATTCATG 


TTGTGAAGTG 


TCATGCACAT 


5160 


GTTAGAAGTG 


AAATATGATA 


TGAGAACTGG 


GCATTATACG 


CCCATACCTA 


ATGAACCTCA 


5220 


TTATTTGGTT 


ATTAGTCATG 


CGGATAAACT 


TAGCGCAACA 


GAAAAAGCGA 


AATTAAGATT 


5280 


ATTAATCATA 


AAACAGAAAT 


TAGATATTTC 


ATTGGCAGAA 


AGTGTAGTTT 


CTTcGCCTAT 


5340 


AGCGAGTGAA 


CATGTGATAG 


AACAATTGAC 


ACTATTTCAA 


CATGAGCGAC 


GACATTTAAG 


5400 


ACCTAAAATA 


AGTGCGACAT 


TTTT AG CCTG 


GTTGTTGATA 


TTTTTAATGT 


TTGCATTGCC 


5460 


AATCGGTATC 


GCTTATCAAT 


TTTCAGATTG 


GTTTCAAAAT 


CAGTATGTGT 


CAGCATGGAT 


5520 


AGAATATTTA 


ACTCAAACAA 


CATTGCTCAA 


TCACGATATA 


TTACAGCATA 


TATTATTTGG 


5580 
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ATTGATTAGT TTATCAACTG CTATAATTGA TCAAACAGGA CTCAAATCAT GGATGATATG 5700 

GGCAATTGAA CCGTCAATGT TATGGATAGG ATTACAAGGT AATGATATCG TGCCACTATT 5760 

5 

AGAAGGGTTT GGATGTAATG CAGCAGCTAT TTCACAAGCA GCACACCAAT GCCATACCTG 5820 

CACGAAGACA CAGTGTATGA GTTTAATAAG CTTTGGrAGT TCTTGTAGTT ATCAAATAGG 5680 

TGCGACATTA TCTATTTTTA GTGTAGCTGG AAAGTCATGG CTATTTATGC CGTACTTAAT 5940 

10 

ATTAGTACTT TTAGGTGGCA TCTTACATAA AGGATATGGT TGAAAAAGAA TGATCAACAA 6000 

CTTAGCGTTC CGCTACCTTA TGATAGGCAA TTACATATGC CAAATATACG TCAAATGTTG 6060 

16 CTACAAATGT GGCAAAATAT ACAAATGTTT ATCGTTCAAG CGCTACCTAT TTTTATCACA 6X20 

ATCTGTCTTA TTGTTAGTAT TTTATCACTA ACGCCAATTT TGAATGTTTT ATCACAAATA 6180 

TTTACACCTA TATTATCGTT ATTAGGCATC TCGTCAGAAT TGTCACCAGG GATTTTATTT 6240 

20 TCAATGATTC GAAAAGACGG CATGCTCTTG TTTAATTTGC ATCAGGGCGC CTTATTACAA 6300 

GGAATGACAG CAACACAGTT ACTACTACTT GTGTTTTTTA GTTCAACATT TACAGCGTGC 6360 

TCGGTCACAA TGACGATGCT TTTGAAACAT TTAGGTGGTC AGTCAGCACT AAAATTAATT 6420 

25 

GGAAAGCAAA TGGTGACATC ATTGTCTTTA GTTATTGGTG TAGGCATCAT TGTTAAAATA 6480 

GTAATGCTGA TTATTTAAAA AAAATGAACT ATAACTGAAT ATAGAGTCAT GTCAGTCAAT 654 0 

AGGAGATCTA TCTTGGAATA TGCTATTCAT ATGAAGTATA AGAGGAGAGT CGCAGATGAA 6600 

30 

AATAGTTATT ATAGGTGGGT TTTTAGGTGG CGGTAAAACG ACTGTCTTAA ATCATTTGCT 6660 

. . CGCTGAATCA TTAAAGGAAT CGCTGAAACC AGCAGTCATC ATGAATGAAT TTGGGAAAAT 6720 

35 GAGTGTTGAT GGTGCCTTAG TATCTGAAGA CATACCTTTA AGTGAACTGA CAGAGGGGTG 6780 

TATCTGTTGT GCAATGAAAG CAGATGTATC AGAACAGTTA CATCAATTAT ATTTAAAAGA 684 0 

GCAACCAGAC ATTGTATTTA TTGAATGTAG TGGGATTGCA GAACCGGTCT CTGTCTTAGA 6900 

40 TGCTTGTTTA ACGCCTATTT TAGCTCCGTT TACAACAATT ACACATATGA TTGGTGTAAT 6960 

AGACGCAAGC ATGTATAAAC ACATTAAATC ATTCCCTAAA GACATCCAAG GCTTATTTTA 7020 

TGAGCAATTA GCATATTGTT CTGTCTTATT TGTTAATAAA AT AG ATT CAG CAGATGTTGA 7080 

45 

AACAACGAGC AAACTATTGA AAGATTTAGA AGTTATTAAC CCAGAGGCCG ATATACAAGT 7140 

CGGTATGCAT GGCAGCGTCA CTTTGCCAAT ATCAGTTAGA CAAATGACAG CAACTTCTGA 7200 

60 CAATAAACAT AAGTCTTTAC ATCAAATGAT TAATCATCAA TTTGTGCAAT CACCAGTCAA 7260 

ATGTACTAAA GCAGAGTTTA TAAAACGTTT AGCATGCCTT CCGTCTCATA TTTATAGGTT 732 0 

GAAAGGGTTT ATGACATTTG AAGACACCGC ACATACGTAT CTCATTCAAT TTACACAAGG 7380 
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CGGAAAGGGT ATTTCAAAAG AAQACTATCA 
GAGAATGGTT AACATGCCTT CATGTATAAT 
5 TAAAAATAAG CTTGGTCAGC CATCAAATAT 

GAGAACAATC AATTAACCCC ACATATTTAA 
AATATAACCT AAGTGACCGC CTGGAATATC 

10 

ATAAAAGTTA ACATCTTGTG GGAAGGAGCC 
TTTATCGCTG TATTTTGTGA AATCATCCAA 

1S TTCAAATTCT GACCAGAACA TCGTACGTTT 
AGCAGGTTGA GACATCATTT TTGCATCAAT 
TTTCATGCCT TTTTCTAAGC CTTCTGTTAA 

20 TTTCCAATAA GTACTGTCTG GTAAAAATGT 
TTTAACGACT TCAGGGTAAT CTTTTAACAC 
ACCTAATATA TAGACAGGTT CATCACTTAA 

25 

GTCGCGTTTG ACACGATAAT CACTGTCAGG 
TAACTCGCTT TCTCCATAAT CACGACGATC 
CTGTTCTGCA AGAGGCAGAA AAATGTCTCC 

30 

CACGGGTCCT TGTCCGACTT GGTGGTATCG 
CATATTCAAT GACCTCCATT TGTTAATTGT 

35 TTGTATAACT TATTTTCTCT TTTTCTTCAT 
CTAATTTTTC AGGCTCAATA TATGGATAAT 
CTTCTTTCTT GACTAAATCA AACTGTGGCT 

40 AATTATTTTT AAAGTAATAG CTTACAGGGT 
CCATACGTTC TAAGAAGAAT GGGATAAACT 
TTGGATAACG ATCAAAAATA CCAGATAATA 

45 

TGTGCCAACC ATAACCAAAA CAAGCAAATG 
TATAGTATGA TTGATAAATG TCACTGTTAA 
50 CTAAATTTTC AGCTGTTTTG AAAATAATGT 
GTGCACGTCC C AT AATGAG C GCACCTTTGA 
CTCGCGCTGC GGCTTCAGGC TCATTGATAG 

55 



ATGTTTGGAA CAGTAGTGTT TTCAGTGGAA 7500 

AACGAGTTGA TTTGAACGTT TAAGCGTAAA 7560 

AATTTGAAAA CTGTCCAAGC TGTTTTATTA 7620 

TAATACATCA GCAAAGCCTT GAGGTTTTTG 7680 

TACAATAGGT ATGCCAGTTT CTTTATTTAT 7740 

TCTAGAATCT GTCCCATTTA GTAGGGTGAT 7800 

AGTAATATCT GAATGCGTAT ATTGTCTAAT 7860 

GTACTGTTCT ATACGTCCTT CTTCAGTATC 7920 

TGGTGCGATA TTTAATGTTT CGCCAAATGT 7980 

AATTTGATGC ACAATGTCAT CATTTTTATC 8040 

ATTAATTGGT GGTTCGTGAA ATGCAATCTT 8100 

ATGCATCGCA ACGATTGAAC CTGAACTTGA 8160 

TGACTTTGCA AGTTCGGCAA TGTCCTGTGC 8220 

GTTTGAAGCG GAATCAGGGA GTGGTTCAGT 8280 

AACGGCTACA ACAGTAAAAT GGTCTTTTAA 834 0 

GGTACCGTTT GCACCAGGAA TAAAGATGAG 84 00 

T AATTT AG CG CCTTGTAATT CTAAAGTTTC 84 60 

TAGGTGATAA ACCTAATAAT TTAGCAC CAT 852 0 

CTGTTAAACC CAGTTCATCT AAAAATACAC 8580 

CAGCAGCATA AAGAATTCTA TCAATACCTA 864 0 

TCGTTAACAT GCCACTCGGT GTGATATAAA 8700 

GGTTCAAATG TTCAGCGAAT AAAGCTTCAT 8760 

CACCCCAATG TCCAATAATC ATATTTAACT 8820 

CTAGATGTAT TGTATGAATG CCGACATCAA 8880 

TTGCCGCAGT TACTTCAGGA TAATTTCCTT 8940 

CTGGCGCGGG ATGTAGATAA ATCGGTACGT 9000 

CATATTTGTC TTGATCAAGA AAACCATCTT 9060 

ATCCTAAATC ATTGATGCAA CGTTCGAATT 9120 

GTAAAGTTGC AAAGCCTACA AAGCGATTGG 9180 
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TCTGACCAAC CAAATTTGAA GGAGAACCAT TTCCATAAGA TAAGACTTGA ATTTGAACGT 93 00 

CTTGATTATT CATAAATTGG ATACGTTCAT CATGATGTGA TAATTCGTCG GCATTTGTAA 9360 

5 

AACCTGTCTT TTTTTcAAGG CCTTCTAACA TTACTTTCAT CGGTACACCT TTAGGATCTG 9420 
CTGATATCGC ATTCATCGTT TCTTTTTGAA TATCTTCAAT GACATAATGT TCTTCAAACG 9480 
TAATACTTTT CATTTACTTC GCCTCCATAT TGTATTGCAT GTTTATTGCA TCTATTGCAG 9540 

10 

AAGCATTTTT TATATACCTC TAATTTCAAT GTTTGTAACA TAAAATTGAT CTACCAAGGC 9600 
ATCTCTCCAT CGCCATTAAT AAATGTACCT GTTGGGCCAT CTGCACCAAT CGTTGCTAAT 9660 
15 TGAATGATTG GCTTGATTCC TTCAGAAACG TGTTTGGAAT TATTACTAAA ATCACCAACT 9720 
AAATCAGTAT TTGTAGCGCC TGGATCAGCA GCATTGATTT GCATGTTAGG TAATCCTTTA 9780 
GCGTATTGTA G CGTTAG CAT TGTTACTGCC GATTTAGACG AACAATAAGC TAATGAATTC 9840 
ACTTTAGATT CAGCTGTTTC GGGGTTTGTA ACCATTCCAA ATGAACCTAA ACCACTTGAT 9900 
ACGTTGACGA CAACAGGTTG TTCAGATTTT TCTAAGAGAG GGACGAATGT ATTCATCATT 9960 

CGTACGATAC CGAATACATT CGTTTGATAT ACTTCTTCAA CGTCACGAGG TGTCAATTTG 10020 

25 

GAAGGTGCTG AAAATTGACC AGATATACCT GCATTGTTAA TGAGGATATC AAGACGGCCT 10080 

TCTTTTTGAG CAATCATGTT ATAAGCATTT TTGACTGAGT AGTCACTTGT AACAT CTAAT 10140 

3Q TGTACATAAT GAACACCTAA TTTTTGTGAT GCTTGTTGTC CTCTTACATC ATTCCGAGAA 10200 

CCTATATAAA CTTTGTAACC CAATGCTTTA AGTGCCTCTG CACTTGCATA GCCTAACCCT 10260 

TTATTGCCTC CTGTGATTAA CACAATTTTA GTCATTACGT CCCACCTCAT CTAAATAAAT 10320 

35 GTTTAATAAA TAATTTCTGT ACGCTTCAAT TGAAATATGG CGATGCTCTA TTTGGAAGGC 10380 

AAATACACTA GTTGATAATG ATTGCAACAG CATATCTGTT TTGAAtTCGT GTAAGTGTCG 10440 

TCATCGCTTT TAAATAAGTC ATAATAAAAA TCAAATAATT CTTGATAAAA TGCGCTTTGG 10500 

40 

TAAAAACGTA ATTTATTGTT GCCTGCTTCA ATACATTGCA GTAGTGCCTT ATT AT CG ATT 10560 

TTAAATTGTA AAAGATAATC TAACGACACT TGCATAACCT CATAATTAGA ATGATAGTCA 10620 

TCTTTAATTT GCTTAAAATG AGTGATAAAA ATATCAAGGT CTCTTTGTAT GACGTAGTAG 10680 

45 

CATAAATCGC TTTTATCTTT GAAATGTCGA TACAATGTCC CCATACCGAT ACCTAGTTCT 10 740 

TTAGCAATAC GATTCATACT AATGTTTTCA ACGCCTTCTT CATCAAAAAG TTTGTGCGCT 10800 

so ATTTCTTCAA TTCGTTGCCT ATTCTCTTTT GCATCTTTTC GCATGATTAC ACCTACTTAA 10860 

AATTCTCTAA AATTGACAAA CGGATAACTC TCCGTTTATT ATAAAACGTG TTAAGAAAGT 10 920 

TAGCAATGAA TTTGCAATAA CTATTAAATA TCATAAAAGA AAAGAGTGTT GATAATGTCT 10 980 
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10 



40 



45 



ACCTTATCGG TTCAAATGAT TGCTGAAAAA CTGAATGTCA CTACAGAAGA TGTGGAAAAA 11100 

GTATTAGCTA TGACAGCGCC ACTAGGCATT TTTAGTCATC AATTACAACG ATTTATTCAT 11160 

TTAGTATGGG ATGTCAGAGA TGTAATAAAC GACAATATTA AAGGAAATGG ACAAACACCA 11220 

GAACCATATA CGTATTTAAA AGGTGAAAAA GAGGACTATT GGTTTTTAAG A 11271 
(2) INFORMATION FOR SEQ ID NO: 12: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 61 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
T5 (P) TOPOLOGY; linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

20 CAACCCGTTC AGAACAAAAT AAAAACCGTA CAATTTTATC ATCTTAATGA TTATTGTACG 60 

GAAAAACTTT TTTACATCAT ATCTGCATGT GCATAATCGA TATCGGTAAA TTTATTATAT 120 

TGTTTCATAA AATGTAACTT AACTGTGCCT GTTGGACCGT TACGTTGCTT AGCAATGATA 180 

25 

ATTTCAATTT CACCGTTTTC ATCATTCGTT TGTGGCTCGA AACCACCATC ATCGTCATCA 240 

TCTTCATCGC CGCCACGGTT ATAGTAATCA TCACGGTATA AGAATGCAAC GATATCGGCA 300 

TCTTGCTCAA TCGAACCAGA TTCACGAATA TCACTCATCA TTGGACGTTT ATCTTGTCGT 360 

30 

TGTTCAACAC CACGAGATAA CTGACTTAAT GCGATAACTG GACATTTTAA TTCACGGGCT 420 

AATGCTTTTA ATGTACGAGA GATTTCAGAA ACTTCCTGTT GTCTGTTATC GGACGCACGT 480 

35 GAACCACTAC CTTGAATCAA CTGTAAGTAG TCAATCACAA TCATGTCTAA GCCATGTTCT 540 

TGCTTTAATC GACGACATTT AGAACGTAAA TCATTAATTC GAATACCCGG TGTATCATCA 600 

ATAAAAATCT TCGTACGTGA TAATTTACCT ACCGCTATAG TAAAACGACT CCAATCTTCC 660 

TCAGTCATAG TACCCGTTCT TAAGCGGTTT GAGTCAACAT TTCCAGAACT ACAAATCATA 720- 

CGTGTGGCTA ACTGATCAGC ACCCATCTCT AGCGAGAAAA TACCAACTGT ATACATATCT 780 

TCATGCGTTG CAACTTTTTG TGCAATATTA AGTGCGAACG CAGTCTTACC TACAGATGGA 84 0 

CGCGCTGCAA GGATAATTAA ATCATTTCGG TTGAACCCTG CTGTCATTTG GTCTAAATCT 900 

CGATATCCTG TAGGTATACC TGGTGTTTGA CCACTATTTT GATCAAGCTC TTCAGCTGTT 960 

TCATACACTT GTCCTAAGAC GTCTCGAATG TCTTTAAAGC CATCGCTTTC ACGAGAAGAT 1020 

GATAGCTCTA AAATTCGACG TTCTGCATCA CTTAAAATCG CATCTAGTTC AAGTTCATCA 1080 

TTATATCCAT CATTGGCAAT ACTATCTGCA GTTTGAATCA ATCTACGTTT TAATGCATGC 114 0 
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TCTGCAAGAT 


ATTGCGGGCC 


ACCCGCTTCA 


TTCAACGTAC 


CTTCCGTCGA 


TAATTGATCC 


1260 




ATCAATGTTA 


CAACATCAAT 


TTCTTTATTA 


TCTTCATTTA 


AGTGCATCAT 


TGCACGGAAA 


1320 


5 


ATATGTTGAT 


GGGCACCCCT 


ATAAAACGAC 


TCAGGAAGCA 


AAACTTCCTG 


AGTAGTATTA 


1380 




ATCAATTCTG 


GATCTATAAT 


AATTGAACCT 


AAGACAGACT 


GTTCAGCTTC 


ATTGTTATGC 


1440 


10 


GGCATTTGAT 


TTTGCTCATA 


CATTCTATCC 


ATGAATGGTT 


ACACCTCTTA 


TTTCAATCCA 


1500 




ACTTTATTGT 


TCAACTGTGT 


GTACGCGAAT 


TGTACCTTCA 


ACTTCTTTAT 


CTAATTTAAC 


1560 




AGGTACATTC 


GTATATCCTA 


GGGAATGAAT 


TCCATTTGGT 


AAATCCATTT 


TACGTTT AT C 


1620 


15 


AATTTTAATA 


TCATGTTGTG 


CTTTTAGTGC 


TTCGGCAATT 


TGTTTTGTAC 


TTACTGACCC 


1680 




AAACAATTTA 


CCACCTTCAC 


CAGTTTTTGC 


TGaTACTTCA 


ACTTCAATGT 


TTGATAACGT 


1740 




TTCTTTTAAT GCTTTAgCAT CTTCAATTTC TTGTTGGCGT TCTTGTTTTG CACGTTTTTT 


1800 


20 


CTGTAACTCT 


AATTGTTTAA 


GGTTACCTGG 


TGTTGCTTCT 


ACAGCATAAT 


TCTTTTTCAA 


1860 




TAAGAAGTTA 


TTTGCATAAC 


CTACTGGTAC 


TTCTTTAACT 


TCACCTTTTT 


TACCTTTACC 


1920 


25 


TTTACCTTTA 


ACATCTTGTG 


TAAAAATTAC 


TTTCATGCAT 


CTTCACTCCT 


ACTTAATTGT 


1980 


TCTGTAATTG 


CTTGTTGTAA 


TTGTGCTATC 


GCCTCTTCGA 


CTGTCACACC 


TTTAAGTTGT 


2040 




GTTGCGGCAT 


TGGTTAAATG 


TCCACCGCCA 


CCAAGTGCTT 


CCATTGTTAA 


CTGGACATTT 


2100 


30 


ACTGAACCGA 


GTGAACGCGC 


AGATATACCA 


ATCAGATTAT 


CTTCACGTCT 


CGCAACAACA 


2160 




TATGATGCTT 


CAATACCTTC 


TAAACTTAAC 


AGTTCATCTG 


CTGCTTGTGC 


AACTGTTACT 


2220 




GGATGATAAA 


TTTTATCGTC 


TGAACCATGC 


GCAATGGCTA 


TGCCATTATC 


TTCAACTTTT 


2280 


35 


ACAGTTCGAA 


TTAATTCAGA 


TCGATTAATG 


TAAGTATCCA 


CATCATCTTT 


TAAGAAATGT 


2340 




TGCGTTAAAA 


TCGTATCTGC 


ACCATGTGCA 


CGTAAATAAC 


TCGCTGCATC 


GAATGTTCTT 


2400 




GATCCTGTTC 


GTAATGTAAA 


GTTTCTTGTA 


TCTACAATAA 


TACCTGCATA 


CATCACTGTT 


2460 


40 


GATTCAAGAC 


GTGTTAAACG 


TTGTTCTGTT 


GGTTGATATT 


CCAGTAACTC 


TGTTACCAAT 


2520 




TCAGCTGTCG 


AACTTGCGTA 


TGGTTCCATA 


TATATCAACA 


ATGGATTAGA 


GATGAAGCTT 


2580 


45 


TCACCACGTC 


TATGATGATC 


GATAACAACT 


TTACGGTTTG 


CTTTATTTAA 


GACATTTTCA 


2640 


TCTAAAACCA 


GTTCCGGTTT 


ATGCGTATCA 


ACAATCACTA 


CGGTTGTCTT 


AGATGTCATC 


2700 




ATATCCCAAG 


CATCATCTGA 


TGTAATAAAT 


CGCTCTCTTA 


ACTCTGGCTT 


TTTATCTATT 


2760 


60 


TCGTTCATCA 


CGCGTCGTAA 


TGTTGGATCA 


ATGTCAGTCT 


CATTTAATAC 


GATGTATGCT 


2820 




TCTAAATTAT 


TCATCATTGC 


AAATCTAGAC 


ACACCGATTG 


CTGCACCAAT 


TGCATCTAAG 


2880 




TCAGGACGTT 


TATGTCCCAT 


GATAATGACT 


TTGTCACCCT 


CTGCAAGGAT 


ATCTTTTAAC 


2940 
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C CAT AGAAAC GCACATTACC ATTAATACTT 
AATGCTAAGT CTAGGCCTGA TTGTGATAAT 
5 TCACCAACAC CGATACTTAA TGTTAATTGG 

TGACTCAAGA TATCAAATTT AGATTCTTCT 
GCTACGAATT GATCGGAACT GTATCTTTTG 

10 

CTAATGACAC GCGTTACCAT TGAGTTGATT 
GTAATCTCAT CGTAGTTATC TAAAAATAAT 

1S AGTTCATTTG TTTGTACTTG TTCAGTTATA 

GAATAACGTA CTTGGAAATG ATACTGATTA 
AATTGCTTTA AAATGTTTGG AAATACTTCA 

20 ATATGATCTG TCATAAATTG GTTAACCCAT 

ATACCAATTG GTAAATGTTT GATTGCTTTA 
CCATCTACAT AACTATCCAT TTTCATTAAA 

25 ATCATCACGA CAAGAACGAT AGATGCAATT 

ACACCCATTA AAACAATTGC TGTGATGATC 
TTAGTGGACT GCCGATTCAT TATTCCACCT 

30 

TTCGCTTCAA ATTCAAACTT AAATCGATAA 
GTGTCAGTAT TGTACCGATA AC CAATAGTA 

35 CTTTACCAAA GAAATGAATA ACACTTAAAC 

GTTGGAAGTT TAAAAGAATG CTCTGGAACA 
TGATAACAAT AATGTATATC CATAATAAAA 

40 TAAATACAGG TGTAGCGATT TTAAATTTTC 

TTAAGACGAT TAAAAATGTA ATGATAATGA 

TAAACCCTTC TTCTAATATT TGGGTCATAT 

45 

CATGTAATGT TTGCTTGAAA GGTTTTACTA 

TTTGTAGTAA CATAAAAGCG ATTAATGAAA 

ATATTCTTTC TTTAGACGTT CTTTCTTTGA 

SO 

AGACTAATAT GATGGCACTT AAAACGAAAG 
TAATAAGTGC ACTAATCCCG AAAGATTGTA 

55 



TTAATTGCAA CTTGGTCGCC ACCGCGTCCT 3060 

TCACCTAAGT CGATTAAATT TTCAGTACCT 3120 

GCACGATAAC CAACACTTTT TTCACGTAAT 3180 

AAGTCAGCTA ATATTTTTTG ATTTAAATAG 324 0 

AAAAATATAT TATACTCAGT TGCCCATCGA 33 00 

TCCGAACGCT GCGTATCATT CATATTTTGC 3 360 

GTCGCAATGA TTGGTTTAGA ATTTTCATAT 3420 

TCAAAGAAAT AGAGGCAGTG ATCATTCTCA 34 80 

TATTCTATTT cAACGGATTT CACTCTATCT 3540 

TTTACAGATT CAGAAATGAC ATTCGCTTCC 3600 

TCGATGTGAT CATTTTCATC TAAAACAATG 3660 

TTATTTGTTG TTGAAATTTG AGCACTCAAA 3720 

GCTTGTCTGA ATAAAATGAT GCTAACAATA 3780 

AGTGCTATAA GACTATTAAA GATAAACCAT 3 840 

ATGATGACAA ATGGTATTAG TAAAGCTTTC 3 900 

CTATTCACTT TTTAGAATTA TTTTTCATGA 3 960 

CACCAAGTAG TCCTACAATA TGTGTCGTAG 4020 

AAATCGTTAC- TGCATTCGGC AAACCTTTCG 4080 

CTTGAATATA CATTACTAAT GATAACACAA 4140 

CACTCGGTTG ACCTGTAAAT AATAAACATA 4200 

TACCGCTCAT TTGCCACGCG AAAAGTGGCT 4260 

GTAAAATCGG AAATGTAACG ATTAAGTTAA 4320 

TGAAACCTGG TAATTGAACG GTCGCTTGTC 4380 

TCGCATCGGC ACCGCTCATC GTAATCGCTT 444 0 

TGCTCGCTGA TGGTGGAATC CTTCCGAATG 4 500 

TTnArCTCAT CGCTACTGTT GTTACGTATA 4560 

GCAATTGACC AATAATTAAA CTTGCAATTA 4620 

TATTACCTAA AACAGTTGTT ATAATTACTG 46 80 

TTGATTTATT CCATAAAACG ATACCTGGTA 4740 
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CAAATACCAA CGCAATCGTT GCAATTATTG TTGCTTTAGG TTGTATTTTT GAAAACACAT 4 860 

AAGCCACTCC CATATTTTTA ACTATAGCTA TTATTTTAAC CTCTTTAATG AAAATTAACA 4 920 

ATTTATAGAT TGTATGCTTC TATTTCATTT AATTGAATAA TAACTTTCAT GTTTTATAAG 4 980 

TAATTAACAT ACTCATTTGA ATCGCTTTTG TGTG CTTTCA TTTTCAACAT GATTATTTAA -5040 

TCCCACTACA TAGCAATCAA GCTTGATTTA GATTTACAAT ACATTTCCAC TCTCATGTAC 5100 

TCTAGATGTT TTTGAATATG ATAACTGTGA TTTAGTGGCT TCATTCTTTG AAAATATATA 5160 

TTATTACTTA CGCTTAAAAT GCTTTAAATT TAAGAAATGA TATAAGTTAG GTGCCCAGGT 5220 

ACTAAAGTTT AGTAGGaATC CATCATGCCC AACATTATCA GGCACGAAGA AATGAGGATG 5280 

ATATTTAAAA CGTTCACCTA ATGCACGAAC TTGATCATCC GGATATAGCA AATCATCTAT 5340 

GAACCCCATC GTTAACACTT TTGTTTCTAA ATTTTTAAAA ACATGCGTTA CGTCTGTGCG 54 00 

20 ACCTCGGTCA ATGTTGTGAC TATCCAATAC ATCTAGCAGT GTCAGATAAC AATTCAAATC 5460 

AAAATGTTCT TTAAATTTAT TACCTTGATG TTGTTGGTAT GCGACTACTT CATCCGGCGT 5520 

AAAACGTTCA TCATAACTTT TTGATGATCG ATATGTCAAA AAACCTAATT GGCGTGCAAT 5580 

ACTTAGACCT TCCTTACCAC CAAGATGAAT GGCTTGCCTT GCAATTTCAT TGAAAGCTCT 5640 

ACTATAAGAT GATGTTCGAC TTGTTGCAGC AAGGATAATG GCTTTATCTA CTTCAAACTG 5700 

TTGATTGTAG AGTAGTTCCA TTGCTTGCAT ACCTCCAAGA CTTCCCCCTA TTAAAATATT 5760 

AATCTTATCA TAACCAAGGG CTTGTATACC TCGTTCATTC GCTCTGACTA TATCTCTTAA 5820 

TGTTAATTTT TTAGGAAAAT GAGGGTCGTT TAAAGGTGAA CTTGAACCGA AAGGACTACC 5880 

AATAACATCA AATGTTAAAA ATTGATAATC GTGAATGGGT ATATATCCCC CAT CAATAAT 594 0 

TTCTCGCCAC CAACCCGGAT AATCATCTGT TCCATATGTT AAATGATTGC CAGTTAATGC 6000 

ATGAeAAACT ACAACTAATG GTTGTCCATG ATAACCGACA TGCTCATATC TCAAACGCAA 6060 

40 GTnATCTATG ACTTCCCCAG ATTCTGTAAT AAATTCCCCT AAATTTAAAG TATCTACTGT 6120 

GTAATTTGTC ATTGTTCTTT CCTCCTTAAA CAAAAAAACT TCTCACCCTA TTGAAAAGTA 6180 

AGAAGTCTTT ATACTTATCA TTCGAGTAAC TCGTTGGTTT TAGCACCGTG CTATAAAGTC 6240 

45 GGTTGCTGAA GTATCACAGG G 6261 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
so (A) LENGTH: 1222 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

ATGCGATTAA CTCTGGAAAT ATCTTTTCCA TATTTACGTn TTAAATTATT CAGCAAATTC 60 

ATACGAGaTT CATACTCGTT yAACACTTGT TCGTCGAATT CTGTATTAGC CATTTCATCA 120 

TATAACTCAT GTTTTGCATC TTCTAAAATG TAGTAAAATT GATCAATATC TTCTTTTAAT 180 

TTGTCATATT TGTTTGGAAC TATATCGTTT ATTGTTAACA AATGGTTGCT TAGTTCATAT 240 

AAACGATCAG TGATAGCATT TTCATCCGTT AATGTCATAT ATGCGTTATT AAGCGCTAAG 300 

CTTAATTTTT CAGAGTTTTG AATGCGTTTA ATATCTATTT CAAGTTGCTC TATTTCGCCT 360 

TCTTTTAGAT GTGCTTCAGA CAATTCTTCT AATTGGAATT TCATTAAATC TAAACGCTGT 420 

AGCAATGCTT GGTCTGCTGA TTCTAAATCT TCTAACTCTT GCTTTTTGGC TTTATAATTT 480 

TGAAAAGTTT GGTGATATTT ATCCAACAAA TCTTGATAAC GTGATTCTGC GTAATTATCC 540 

20 AATAATGTTA AATGGTATTT TTGTTTCAAC AAAGACTGCG TTTCATGTTG GCCATGAATA 600 

TCTAATAATT CTTGCATAAC TTTTCGTAAA TCTTGTAAAG TAACTGTTTG ATTATTAATT 660 

TTACAAAGAC TTTTACCAGA GCTGAAAATT TCCCGTTTAA CTAATAAAAA ATCTTCATCT 720 

ACATCAATAT CCATATTTTT CAATATATGT ATAGCATCTT TACTCTCGTC AATATCAAAT 780 

ATACCTTCGA TGACAGCCTT TTTTTCACCA TGTCTTACAA AATCAGATGA AGCTCTCATT 840 

CCAATTAATT GTCCAATTGC ATCTATAATA ATTGACTTAC CTGAACCCGT TTCACCACTT 900 

AAAACAGTTA AACCATCAGA AAATTGAATT TCTAATTCTT CAATAATAGC AAATTGCTTG 960 

ATTGATAAGG TTTGTAACAT AAACTCATCG CATCCTTATA ACAAATTGAA AATTCTTGAC 1020 

TTGATTTCAT CACTTGCCTC TTTGCTTCGA CAAATAATTA AACAAGTATC ATCACCACAA 1080 

ATTGTG CCTA GTACTTCTTC CCAATTGATT TGGTCTAATA TAGCTCCAAT AGATTGTGCA 1140 

TTACCAGGTA TGTTTT TAG A ACAAGTAAAT TATCAGTACC ATCTATATTA ACAAAGGAAT 1200 

40 CCATTAAATA ACGTCCCAAT TT 1222 
(2) INFORMATION FOR SEQ ID NO: 14: 



25 



30 



35 



45 



50 



$5 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1021 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
<D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
TTTGTTATTA TTACnTnAAA TAATTGCATT ACTTTTTACT GATGGTACAA CTTTCCATCC 60 
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TTCTTTTGGC ACGACATAAT TGTCTTTATC TTGAACTAAA TATCCGCCAG ATACTGAAAC 180 

AAACTCTTCT TCGTTACTGT CTATAGTCAT ATCAATTTCT AATAATCTTA CATTCTTCTT 240 

TTGTTTTAAA ATATCTAATG CTTCATCTGT AAATTTTGGT GCAATAATGA CTTCCAAAAA 300 

GATACTATGC AATTGCTCTG CTAACTCAGG TGTTACAGCT CGGTTTAATG CAACAATTCC 3 60 

ACCAAATATT GATTGACTAT CCGCTTCATA CGCATGTTGA AATGCTTGTT CTATCGTGTC 420 

ACCGATACCA ACACCACATG GATTCATGTG TTTAACCGCA ACTGTAGCAG GTGTATCAAA 4 80 

CTTTTTAACT AAAGCTAGTG TAGCATCTGC ATCTTTAATA TTGTTATAGC TTAATTGTTT 54 0 

1S CCCATGTAAT TGTTTAGCGC CTGCAATCGT GTGCTTAGCA TTCGAAGTTC TCACAAAATA 600 

CGCTGATTGT TGTGGATTTT CTCCATATCT TAAAGTTTCT TTATCCCCTT TAAAGAAACG 660 

TACAATCGCT TCATCATATT CTGCAGTATG CTCAAAAACT TTAATCATTA ATGATTGTCT 720 

20 ATATGACTCA TCTAACGAAT CGTTTCTTAA TCGCGTCAAT ACTTCTTGAT AATCTGCCGG 780 

ATGTACAATT GTTGTTACAT GTTTATAGTT TTTAGCTGCA GCACGTAACA TTGTTGGACC 840 

ACCAATATCA ATATTTTCAA TTGCTTCGTC CATCGTCACA TCAGGGTTTG CAACAGTTTG 900 

TTGGAATGGA TATAAATTAA CTACTACCAT ATCAATTAAA TCTATATGTT GTTCTGATAA 960 

TTCATTTAAA TGCTGCGGTT TATTTCGATC AGCTAAAATG CCACCATGAA CAGCCGGATG 1020 

T 1021 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3759 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

40 - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

TCATTCACTC CTAAATTGTT ATTACACTAT TACACaTAGC TAATCATCAA TGTGAAATCA 60 

CCTTCAAAGA CACTATCCAA ATCTTCAGAA GTCAAAATAA AGTTTGTACC AGTAGTCAGT 120 

TTGAAAATTT CACCATCGAC AATCATTTGC CCTTCGCCTT CCAACACTGT AACTAAACAG 180 

AACTCTCTAG GCTTCATATA ATTTAACGTG CCAGAAATTT CCCATTTAAC CAATGTAAAG 24 0 

AAATCATTCG ATACAATGTG TGTACACTTA TGGTTTTCAA TAATTTCGCT TTCAGGCAAA 3 00 

ATATTAGGTA ATGGTGCATT GTACTGAATA ACGTCTAAAG CTTTTTCAAT ATTTAACGGT 360 

CTATCATTAT ATTGATTATC TTGACGATTG AAATCATAAA GTCTATATGT AATGTCTGAC 420 
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ATAAAAtAGa ATTCyCCAGG kTTTAC t TT A 
TGTTGAACAT GATTCGCAAC TTCTTCTCTA 
5 GCATCTTCTT CTGCATCTAT AATATACCAA 

TGCTCATAAG CATAAGAATT ATCAGGGTGC 
ACTATTTTAG TTAGAAGCGG AAAATCTTTG 

10 

TCTGACCAAA TACGGTCTAA TGTTTGACCT 
CCATTTGGAT GTGCTGACAC ACACCAACAT 

1$ TCGAAACTCA CTTAGACGTT GACCGCCCCA 
TAATGGCATT GTTGCACCTC CATTGTGATT 
TCCATTATAT TTTGATTTTG TTCTCATTTA 

20 TAACTATTAG TGATTGTACC ATATTTACTA 
CACTTAAATT TACAGTACTT TAACATTTTC 
CTTACATTTG TACATATTTC CCTTTAAATT 

25 CTTTAATAGT TGTGCCATAC ATTGTTCAAA 

TTATTCATAC TTATAATTCA TCATTTTCAA 
TTCAAATCAT ATTTACTATC CTTATTAATC 

30 

TTTAATGTCC TGATCACCAC TAATAATTTG 
GACAATTTCT TTTAATACTG TCGCAACATC 
ATATTGTGCA GCTTCTATCT TTCCAGATCC 

35 

AATTGTATAA TTCAAACCTG nAACGTCTTA 
TATATGGCTT TAAATCACCG CTATCATCAA 

40 CCAfGACATA GTGTTTAATA TTGGCCTCTT 

CTAAATCGAC AATAATTGTT TTATCTGCAC 
TAACTTTATC GAATGGTTTA AACGTCTCAG 

45 CAACAAGAAT TG CTTTCAT A CCTTGTGATT 

CACCAGCAGT AAATGGTACA TTTTCTTTTG 
CGCCATTAGC ACCTATAACC AAAATATTCA 

60 

ATGCCATACC ACTTTATGAG ATATGTAAAA 
ACTACTGGGA ACGTATTAAA TTAATATATG 

55 



AtatATCyAA gTAtCGaCtC tATCGTTCCG 540 

GACTCTGCTA ATGTCCCtAT AACTATTTCT 600 

CATTCAGATT TGCCATATTG CCCgTTTTCA 660 

ACATGAATAG AAAGTGATTC TCTTGCATCC 72 0 

CTTGGGAAAT CACCAAACAA TTCACGATGT 780 

TGATATGGTC CATTAATAAT CTCGCTCGTA 840 

TCCCCCAGTT GTATCATTGT CTAATTGATA 900 

TAATTTTGTT TTTAAAATTG GTTGTAAAAA 960 

AAGTAAGCAA TAGAACTCTG ATGTTGTTGT 1020 

CATCGTATTA TTAACTTCCA CATTTCAAAT 1080 

ACATTGCAGT ACTGCCAATT AAAAGnGCTT 1140 

AAAAATTTAT AGCATAGAGA TTATATCTCT 1200 

TACTCGCCCA TTATACCAAT TAATAaACAA 1260 

TTCTTTGTAA AACGCATAGA CAATACGTAC 1320 

AAAATAACGA GTTACG AAAA AGTAACCCGC 1380 

CGTTTCATTT TCAAATTGAG TTAAAGCATC 1440 

AAACTCTTGG TGATTAAAAT GATTGGATGT 1500 

TTCTCTAGGA ATTTCACCTT TACCATCAAA 1560 

TGCTGCATTT GTAAGTGCCC CTGGATGTAA 1620 

AATAGTCATC AGCGTAATGT TTAGCTATTG 1680 

AAGCCTGACG TCTCGAATCA TATGTTGAAA 1740 

TACTCGCAAT CATTGATTTA ACAGCACCAT 1800 

CCGTGTTCCC TCCAGAACCT ACTGAAAAGA 1860 

TTAAAGTCTC TATTGAATCA TTTTCAACAT 1920 

TTAACGCATT AAGTTGATCT GATTGCCTAA 1980 

CTAATTGTTG CACTAGTAAC GAACCTACAC 2040 

TTTACAACAC TCTCCTATkT ATTATTCTCT 2100 

CTTGTTACAA CTATAAAAAT CAATTGACAT 2160 

AACAAATATT CATATGAAAG GATTGTCATA 2220 
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tCaAGGCATT AGcGATTACA ATCGAATACG TATCaTGOAA TTGTTATCaG TCAG CGAAgC 2340 

AAGTGTTGGT CACATTtCAC ATCAATTGAA TTTATCTCAA TCAAATGTCT CGCACCAATT 2400 

AAAATTACTT AAAAGTGTGC ATCTTGTGAA AGCAAAACGA CAAGGCCAAT CAATOATTTA 2460 

TTCATTAGAT GACATCCACG TAGCAACTAT GTTAAAGCAA GCCATACATC ACGCGAATCA 2520 

TCCTAAAGAA AGTGGGTTAT AATATGTCTC ATTCACATCA TCATCATOAC CATATGCATA 2580 

GTCATGTAAC TACAAATAAT AAGAAAGTAT TGTTTATATC GTTTTTAATA ATCGGTCTAT 2640 

ATATGTTTAT CGAAATCATC GGCGGTCTCC TTGCTAACAG CTTGGCATTA CTATCTGACG 2700 

15 GTATCCATAT GTTTAGCGAC ACATTCTCAT TAGGTGTTGC ACTTGTCGCA TTTATTTATG 2760 

CTGAAAAGAA TGCCACAACT ACAAAAACAT TTGGTTATAA ACGTTTCGAA GTACTCGCAG 2820 

CGTTATTTAA CGGTGTAACG CTTTTTGTAA TAAGTATTTT GATTGTTTTT GAAGCGATTA 2880 

20 AACGTTTCTT TGTTCCTTCT GAAGTTCAAT CAAAAGAAAT GTTAATCATT AGTATTATCG 2940 

GTTTAATTGT CAATATCGTT GTTGCATTCT TTATGTTTAA AGGCGGCGAC ACTTCACACA 3000 

ATTTAAATAT GCGTGGTGCT TTTCTACATG TTATCGGAGA CTTATTAGGT TCAGTTGGCG 3060 

25 

CCATTACTGC AGCTAkTTTA ATTTGGGCAT TTGGATGGAC AATCGCCGAT CCTATCGCAA 3120 

GTATTTTAGT TTCCGTTATT ATTTTAAAAA GTGCTTGGGG TATCACAAAA TCTTCAATTA 3180 

ACATTTTAAT GGaAGGCACA CCAAGTGATG TTGATATAGA TGAAGTTATA ACTACTATTA 3240 

AAAAGGATTC ACGAATACAA AGTGTGCATG ATTG CCATGT TTGGACAATT TCAAATGATA 33 00 

TGAATGCATT GAGTTGTCAT GTTGTTGTAG AC CATACATT GACAATGAAA GAATGTGAAT 3360 

TATTATTAGA AAa CATTGAG CATGATTTAT TACATTTAAA TATTCACCAT ATGACTATTC 3420 

AATTAGAAAC GCCTAATCAC AAACATGATG AATCGATTAT ATGTTCAGGA ACACATAGTC 34 80 

ATTCACATAA CCATCATGCT CATCATCACG CGCATGTACA TTAATAATTT TAAC CTACTG 3 540 

40 CCATTGCATC GATTAAACTT TTCAATGGCA GTAGGTTTTT TATGTCTTTA TGGCGACTTG 3600 

TTTGGTCTTT GATGATGCAA TGTTTATTAA CAAATTTTCA ACTATTATTT CTTACATTAG 3660 

TCATATTTTT GACAATTTAC TATTATAATT CTCTAACTTT AGTCACTTTA ATTAATTTTT 3720 

45 ATTAGATATT AATATGAAAA TAACGTGTTT TTTGTTATT 3 759 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 
so (A) LENGTH: 13086 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(xi) 


SEQUENCE DESCRIPTION: 


SEQ ID NO: 


16: 








TAATTATCGC 


GCATAACAAA 


ACATTAGCAG 


GACAATTATA 


TAGTGAGTTT 


AAAGAATTTT 


60 


5 


TTCCTGAAAA 


CAGGGTGGAA 


TACTTTGTAA 


GTtACTATGA 


TTATTATCAn 


CCAGAGGCAT 


120 




ACGTACCGTC 


TACTGACACT 


TTTATTGAAA 


nAGATGCCTC 


AATCAnTGAT 


GAAATTGATC 


180 


10 


AACTACGACA TTCTGCTACA AGTGCATTAT 


TTGAACGCGA 


TGATGTAATT 


ATTATTGCTA 


240 


GTGTAAGTTG 


TATATATGGT 


TTAGGTAATC 


CTGAAGAATA 


TAAAGATTTA 


GTAGTAAGTG 


300 




TTCGAGTTGG 


TATGGAAATG 


GATAGAAGTG 


AATTACTTAG 


AAAACTTGTc 


AGATGTGCAA 


360 


15 


TATACACGAA ATGACATCgA TTTcCAACGA GGAACGTTTC GAGTGCGTGG 


TGATGTAGTG 


420 


GAAATATTCC 


CAGCCTCTAA 


AGAAGAACTT 


TGTATAAGGG 


TTGAGTTTTT 


CGGCGATGAG 


480 




ATTGACCGTA 


TCCGAGAAGT 


TAACTACCTA 


ACAGGTGAAG 


TGTTGAAAGA 


AAGAGAACAT 


540 


20 


TTTGCGATAT 


TCCCAGCTTC 


TCACTTCGTA 


ACACGTGAAG 


AAAAGTTGAA 


AGTTGCGATT 


600 




GAAOGTATTG 


AAAAAGAATT 


GGAAGAACGA 


TTGAAAGAAT 


TACGAGATGA 


GAATAAATTA 


660 




CTAGAAGCGC 


AAAGGTTAGA 


ACAGCGTACC 


AACTATGATT 


TAGAAATGAT 


GCGAGAGATG 


720 


25 


GGATTCTGTT 


CAGGAATTGA 


AAACTATTCC 


GTACATTTAA 


CTTTGCGACC 


ACTGGGTTCG 


780 




ACACCATATA 


CTTTATTGGA 


TTACTTTGGC 


GATGATTGGT 


TAGTAATGAT 


TGATGAATCA 


840 




CATGTGACAT 


TACCGCAAGT 


TCGAGGCATG 


TATAACGGAG 


ACAGAGCGCG 


TAAACAAGTT 


900 


30 


TTGGTGGATC 


ATGGGTTTAG 


ATTAC CGAGT 


GCATTAGATA 


ACCGTCCACT 


TAAATTTGAA 


960 




GAATTTGAAG 


ttiAAAGACAAA 


ACAACTTGTG 


TATGTATCTG 


CAACGCCTGG 


ACCATACGAA 


1020 




ATTGAACATA 


CGGATAAGAT 


GGTTGAACAA 


ATTATTCGTC 


CTACTGGTTT 


ACTGGATCCT 


1080 


35 


AAGATTGAGG 


TTAGACCTAC 


TGAAAATCAA 


ATTGACGATT 


TATTAAGTGA 


AATTCAAACA 


1140 




AGAGTgAGCG TAATGAACGC GTACTTGTTA CAACGCTCAC TAAAAAGATG AGTGAAGATT 


1200 


40 


aACCACATAC 


ATGAAAGAaG 


CGGGTATTAA 


aGTtAATTAT 


CTGCATTCAG 


AAATCAAGAC 


1260 


ATTAGAACGA 


ATTGAAATAA 


TTAGAGACTT 


ACGAATGGGT 


ACATATGATG 


TTATCGTAGG 


1320 




TATTAATTTA 


TTAAGAGAGG 


GTATTGATAT 


ACCAGAAGTT 


TCTCTAGTTG 


TCATATTAGA 


1380 


45 


TGCAGATAAA 


GAAGGGTTTT 


TACGTTCTAA 


CCGCTCATTA ATTCAAaCAA TAGGTAGAgC 


1440 




TGCGCGTAAC 


GATAAaGGTG 


AAGTCATTAT 


GTATGCCGAT 


AAAATGACTG 


ATTCGATGAA 


1500 




GTATGCAATT 


GATGAGACAC 


AACGTCGTCG 


AGAAATACAG 


ATGAAACATA 


ATGAAAAACA 


1560 


SO 


TGGTATTACA 


CCTAAAACAA 


TTAATAAAAA 


AATACATGAT 


TTAATTAGTG 


CTACTGTTGA 


1620 




AAATGACGAA 


AATAATGACA 


AAGCACAAAC 


TGTGATACCT 


AAGAAGATGA 


CGAAAAAAGA 


1680 
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TTTCGAGAAA 


GCTACAGAAT 


TAAGAGATAT 


GTTATTTGAA 


TTAAAAGCAG 


AAGGGTGACA 


1800 




AGTAAATGAA 


AGAACCATCC 


ATAGTAGTAA 


AAGGTGCTCG 


TGCGCATAAC 


TTGAAAGATA 


1860 


5 


TTGATATCGA 


ACTACCTAAA 


AaTAAATTAA 


TTGTTATGAC 


AGGTTTATCT 


GGGTCAGGTA 


1920 




AATCGTCATT 


AGCATTCGAT 


ACTATATATG 


CTGAAGGACA 


ACGACGTTAT 


GTTG AAT CAT 


1980 


10 


TAAGTGCCTA 


TGCGCGTCAA 


TTTTTAGGCC 


AAATGGACAA 


ACCAGATGTT 


GATACAATTG 


X vt U 


AAGGATTATC 


GCCAGCAATT 


TCAATAGATC 


AAAAAACAAC 


AAGTAAAAAT 


CCAAf? AT C.AA 


«1UU 




CTGTAGCAAC 


AGTAACAGAA 


ATATATGATT 


ATATAPflTTT 


f3*TT A T ATrt (** A 


fYSTfSTTfVSTA 


ZlOU 


15 


AACCTTACTG 


TCCAAATCAC 


AATATAftAAA 
**** x.#* x.nvxnnn 


TTY5 A A TfY2 P A 


& Ar*Af2TAr»a a 


LAAA 1\Aj 1 1\» 


2220 




ACCGCATTAT 


GGAATTAGAG 


GCAPGTAPJVA 


Arz anr a att 

nwtl X \— *-vrVX X 


All /VjLALL X 


v7 1 LA 1 Lv?L 1 L 


22BU 




ATCGTAAAGG 

A \*\* X/W\W 


TAG TCATGAA 


AAf2(TA ATf"Y2 


A Afi ATATTYS/l 
AnlanlAl luu 


IAAAAAAUvj 1 


x A 1\» 1 Avjlj 1 1 


2340 


20 


TAAnAATfY^A 


TCZCl Pf2 A A ATT 


nTTY2 aTRT a a 

VjI IvAiulAA 


A 1 LiA i\j 1 ALL 


TACTTTAGAT 


AAGaACaaGA 


2400 






AUivvJ X X \J X X 


\a 1 AUALLlJAl 


TAGTTGTTAA 


AGATGGAATT 


GAAACACGAC 


2460 




XnUL 1 Lj/\L A L 


1/\1 >ioAAAV_ X 


v>LA. X lAvmuV. 


TTTCAGAAGG 


acaattaaca 


GTCGATGTCA 


2520 


25 




n r* n rPTTa ar? 
ALjMLLX lnnu 


1111 LALjAAA 


GCCATGCTTG 


TCCTATATGT 


GGATTTTCAA 


2580 




X v_. k^Vjrt \J VJ A X 


.rt VjM/IL LAAvtA 


A 1 Vj 1 1 lAutl 


1 lAALAljlLL 


TTTTGGTGCT 


TGTCCGACAT 


2640 




*J X v*rt X X X 




i*pa a r , ar2Tr , r* 


A X La 1 AtiAL 1 X - 


GGTTGTTCCC 


GACAAAGATA 


2700 


30 




mzv af^^TY^r 1 a 




LjVjA 1 ALLoAL- 


GAGiT ITGAT 


TTTTATCCAA 


2760 






7\ *" i fa *» 1 *» I 'tftl^T* 

^VLvjXLjI 1 i<J i 


Vsnnu 1 1 1M1A 


AAAALAA1A1 


LrtjAl AAALL 1 


TTTAAAAAGT 


2820 


35 


TAACAGAACG 


TP A ArfiTn A T 


A *f*^r^n T A ^PT^l T 
*%X 1 X X f\ ± IVji 


A f TW2TTr v TV*2fi~ 


luALAAAuAA 


ATTGAATTTA 


2880 


CATTTACACA 


ACGTCAAGGT 


fVSTArTAnAA 


narnaaraaT 

nnLunHLnnl 


- ItV? X X X X LvAv 


wjt 1 Vj 1 ALj 11 L 


294 0 




CTAATATAAG 


TAGACGATTC 


CATGAATCrrC 




T A f APfiTYZ A A 


1 v»A 1 oAVj 1 A 


3 00 0 


40 


AATATATGAC 


TGAACTACCT 


TGCGAAACTT 


GTCATGGAAA 


vjvvjrt x x vinu x 


X UAnuLK X 


J UbU 


TATCTGTTTA 


TGTAGGTGGT 


TTAAATATTG 


GTGAAGTAGT 


CGAATATTCA 


ATC AfiTC A Af2 


J Xjt KJ 




CGCTGAACTA 


TTATAAAAAC 


ATTGATTTGT 


CAGAACAAGA 


TCAAGrCGATT 


GCAAATPAAA 


it on 
JlOv 


45 


TATTGAAAGA 


AATTATTTCC 


CGACTCACTT 


TTTTAAATAA 


TGTGGGACTT 


GAATATTTAA 


3240 




CGTTAAACAG 


AG CTT CAGGT 


ACACTTTCAG 


GTGGTGAAGC 


ACAACGTATT 


CGATTAGCAA 


3300 




CGCAAATTGG 


GTCGCGTTTG 


ACTGGTGTCT 


TATATGTATT 


AGATGAGCCA 


TCAATTGGAC 


3360 


50 


TGCATCAAAG 


AGATAATGAT 


CGATTAATTA 


ATACACTTAA 


AGAAATGAGA 


GATTTAGGAA 


3420 




ATACTTTAAT 


TGTAGTTGAA 


CACGATGATG 


AT ACAATG CG 


TGCGGCTGAT 


TACTTAGTGG 


3480 
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AGGTAATGAA AGATAAAAAA TCATTAACAG GACAATACTT GAGTGGTAAG AAACGTATTG 3600 

AAGTACCTGA ATATCGCAGA CCGGCTTCAG ATCGTAAAAT TTCTATACGT GGAGCTAGAA 3660 

GCAACAATCT TAAAGGGGTT GATGTGGACA TACCACTATC AATCATGACG GTTGTTACAG 3720 

GTGTATCAGG TTCTGGTAAA AGCTCATTAG TAAATGAAGT ATTATACAAA TCATTAGCTC 3780 

AAAAAATTAA TAAATCTAAA GTAAAGCCAG GATTGTACGA TAAGATTGAA GGTATTGATC 3840 

AACTTGATAA AATTATTGAT ATTGATCAAT CACCAATAGG TAGAACGCCA CGCTCTAATC 3900 

CAGCAACATA TACTGGTGTG TTTGATGATA TACGTGATGT GTTTGCGCAA ACAAATGAAG 3960 

CTAAAATTCG AGGATATCAA AAAGGGCGTT TTAGTTTTAA TGTAAAAGGT GGACGCTGTG 4020 

AAgcTTGTAA AGGTGACGGT ATTATTAAAA TTGAAATGCA TTTTTTACCT GATGTTTATG 4080 

TTCCTTGTGA AGTGTGTGAT GGTAAACGAT ATAATCGTGA GACACTAGAG GTTACTTACA 4140 

AAGGTAAAAA TATTGCTGAC ATTTTAGAAA TGACTGTTGA AGAAGCAACA CAATTTTTTG 4200 

AAAATATTCC TAAGATTAAG CGCAAGTTAC AAACACTAGT TGATGTTGGT CTTGGATAOG 4260 

TCACATTAGG TCAACAAGCT ACAACGTTAT CAGGTGGTGA GGCTCAACGT GTGAaACTTG 4320 

25 CATCTGAACT TCATAAACGT TCAACTGGTA AATCTATTTA TATCCTAGAT GAACCGACAA 4380 

CAGGGTTACA TGTTGACGAT ATTAGTAGAT TATTAAAAGT ATTAAACCGA TTAGTTGAAA 4440 

ATGGTGATAC TGTTGTAATT ATTGAACATA ACCTAGATGT TATCAAAACA GCAGACTATA 4500 

30 TTATAGACTT AGGTCCTGAA GGTGGTAGTG GCGGTGGTAC TATTGTTGCG ACTGGCACAC 4 560 

CCGAAGATAT TGCTCAGACA AAGT CATCAT ATACAGGAAA GTATTTAAAA GAAGTACTTG 4620 

AACGAGATAA ACAAAATACT GAAGATAAAT AAGATTAAAA GAAGTGAAGG ATGTTATAAA 4680 

35 

TTTATCCTTC GCTTCTTTTT ATTAATTTAG TAATGAATAG T AGAAAG AAA AGATGCGTAA 4 740 

AAAGAATTAT GTTAAGATAG GGTCAATCTA GAGTAGTTAA ACATAAATCG AACTGGGAGT 4800 

GGGACAGAAA TGATAAAGAA TCACTAATGA TTTATTATGT AGTGGTTCTT TGTCATTAGC 4 860 

CACAGCTATT GTGTACTTAA AAATAGGaat GCaTgAGTGC AACTCATGCA T AAG aAAT AC 4 920 

TAATTTCTAA AGAAAAAGTA TTTCTTTATG TTGGGGCCCC GCCAACTTGC ATTGTTTGTA 4 980 

GAATTTCTTT TCGAAATTCT TTATGTTGGG GCCCCGCCAA CTTGCATTGT TTGTAGAATT 5040 

TCTTTTCGAA ATTCTTTATG TTGGGGCCCC GCCAACTAAT TCCAATATAT CATTGTAGAG 5100 

CTTAGGTCAT TGATTTTTGG CTCGGACTTT TATGGCGATA TGAACCATGT AAATTAAGCA 5160 

50 AGCAATAAAT TAATGATTGA TATTGACTTG TAAAATAATA ACAATAATGA ACAATTAATA 5220 

TTTATTTTAG CTTTTCAATG TAGATTGGTG TTATATTTTT GATATGATAA GAAGAGATGT 5280 
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ACATTAAAGT 


TAGATTTAAT 


CGCTGGTOAA 


GAAGGACTAT 


CGAAGCCAAT 


TAAAAATGCT 


5400 




GATATATCAA 


GACCGGG CTT 


AGAGATGGCA 


GGTTATTTTT 


CACATTATGC 


GTCAGATAGA 


5460 


5 


ATACAACTAT 


TAGGAACAAC 


GGAACTATCG 


TTTTACAATT 


TATTACCAGA 


TAAGGATCGC 


5S20 




GCAGGTCGTA 


TGCGTAAACT 


ATGCAGACCA 


GAAACGCCTG 


CAATTATTGT 


GACACGTGGA 


55B0 




TTGCAGCCAC 


CAGAAGAATT 


AGTTGAAGCT 


GCAAAAGAAT 


TAAATACCCC 


ACTTATAGTT 


5640 


10 


GCTAAAGATG 


CGACTACAAG 


TTTAATGAGT 


CGCTTAACAA 


CGTTTTTAGA 


GCATGCACTT 


5700 




GCAAAGACGA 


CATCTTTACA 


TGGTGTTTTA 


GTAGATGTTT 


ACGGTGTTGG 


TGTACTAATT 


5760 


15 


ACCGGTGATT 


CAGGAATAGG 


TAAAAGTGAG 


ACTGCGTTGG 


AATTAGTTAA 


ACGTGGG CAT 


5820 


AGATTAGTAG 


CAGATGATAA 


TGTAGAAATA 


CGTCAAATTA 


ATAAAGATGA 


ACTAATAGGG 


5880 




AAACCACCAA 


AGTTAATAGA 


ACATCTATTA 


GAAATACGTG 


GACTAGGTAT 


TATCAATGTT 


5940 


20 


ATGACTTTAT 


TTGGCGCGGG 


TTCAATATTA 


ACTGAAAAAC 


GAATTAGATT 


AAATATTAAT 


6000 




TTGGAAAACT 


GGAACAAGCA 


AAAGTTATAT 


GACCGOGTAG 


GTCTTAATGA 


AGAGACGCTA 


6060 




AGTATTTTAG 


ATACTGAAAT 


CACTAAAAAA 


ACAATACCTG 


TAAGACCTGG 


TAGAAATGTT 


6120 


25 


GCGGTAATTA 


TTGAGGTCGC 


TGCAATGAAC 


TATCGATTAA 


ATATCATGGG 


CATTAACACG 


6180 




GCCGAAGAAT 


TTAGTGAAAG 


ATTAAATGAA 


GAAATTATCA 


AG AACAGT CA 


TAAGAGTGAG 


6240 




GAGTAGGTTG 


AATGGGTATT 


GTATTTAACT 


ATATAGATCC 


TGTGGCATTT 


AACTTAGGAC 


6300 


30 


CACTGAGTGT 


ACGATGGTAT 


GGAATTATCA 


TTGCTGTCGG 


AATATTACTT 


GGTTACTTTG 


6360 




TTgCACAACG 


TGCACTAGTT 


AAAGCAGGAT 


TACATAAAGA 


TACTTTAGTA 


GATATTATTT 


6420 




TTTATAGTGC 


ACTATTTGGA 


TTTATCGCGG 


CACGAATCTA 


TTTTGTGATT 


TTCCAATGGC 


6480 


35 


CATATTACGC 


GGAAAATCCA 


AGTGAAATTA 


TTAAAATATG 


GCATGGTGGA 


ATAGCAATAC 


6540 




ATGGTGGTTT 


AATAGGTGGC 


TTTATTGCTG 


GTGTTATTGT 


ATGTAAAGTG 


AAAAATTTAA 


6600 


40 


ACCCATTTCA 


AATTGGTGAT 


ATCGTTGCGC 


CAAGTATAAT 


TTTAGCGCAA 


GGAATTGGAC 


6660 


GCTGGGGTAA 


CTTTATGAAT 


CACGAGGCAC 


ATGGTGGATC 


GGTGTCACGC 


GCTTTTTTAG 


6720 




AACAATTACA 


TTTGCCTAAT 


TTTATAATAG 


AAAATATGTA 


TATTAACGGC 


CAATATTATC 


6780 


45 


ATCCAACATT 


CTTATATGAA 


TCCATTTGGG 


ATGTCGCTGG 


ATTTATTATC 


TTAGTTAATA 


6840 




TTCGTAAACA 


TTTAAAATTA 


GGAGAAACAT 


TCTTTTTATA 


TTTAACTTGG 


TATT CAATTG 


6900 




GTCGATTCTT 


TATAGAAGGA 


TTACGTACAG 


ATAGCTTAAT 


GCTCACAAGT 


AATATTAGAG 


6960 


50 


TTGCACAATT 


AGTATCAATT 


CTTTTAATTT 


TAATAAGTAT 


AAGTTTAATT 


GTATATAGAA 


7020 




GGATTAAGTA 


TAATCCACCG 


TTGTATAGCA 


AAGTTGGGGC 


GCTTCCATGG 


CCAACAAAAA 


7080 
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TTATGGCGTG 


TATACCGTCT 


TGTTAAATTT 


TCGAAAGTTT 


TTAAGAATGT 


AATTATCATT 


7200 




GAATTTTCGA 


AATTTATTCC 


AAGTATGGTA 


CTGAAAAGAC 


ATATATATAA 


ACAACTTTTA 


7260 


5 


AATATTAATA 


TCGGTAATCA 


ATCGTCGATA 


GCTTATAAAG 


TAATGTTAGA 


TATTTTTTAC 


7320 




CCAGAACTGA 


TTACGATTGG 


TAGTAACAGT 


GTTATTGGTT 


ACAATGTAAC 


AATTTTGACG 


7380 


10 


CATGAAGCAT 


TAGTTGATGA 


ATTTCGTTAT 


GGACCAGTGA 


CGATAGGATC 


TAACACTTTG 


7440 


ATTGGTGCAA 


ATGCTACCAT 


TTTACCCGGT 


ATAACGATTG 


GTGACAATGT 


AAAAGTTGCA 


7500 




GCTGGTACGG 


TTGTTTCAAA 


AGATATACCG 


GATAATGGAT 


TTGCATATGG 


CAACCCTATG 


7560 


15 


TATATAAAAA 


TGATTAGGAG 


GTGACAATTT 


TATGGCGCAA 


AAGAATAATA 


ATGTAATTCC 


7620 


AATGACTTTT 


GATGATGCAT 


TTTATCGTAA 


AATGGCTAAA 


CAGAAGTTTA 


AACAAAGAGA 


7680 




ATATAAACGA 


GCTGCTGAAT 


ACTTTGAAAA AGTGTTAGAA 


TTGTCACCTG 


ATGATCTGGA 


7740 


20 


AATTCAAATT 


GATTATGCAC 


AATGTCTAGT 


GCAACTTGGT 


ATTGCTAAAA 


AAGCAGAACA 


7800 




TTTATTTTAT 


GACAATATTA 


TTTATAATAG 


GCATCTAGAA 


GATAGCTTTT 


ATGAATTGAG 


7860 




TCAGCTCAAC 


ATTGAAGTTA 


ACGAACCAAA 


CAAGGCATTC 


TTGTTTGGTA 


TT AATT ATG T 


7920 


25 


TATTGTTAGC 


GACGACCAAG 


ATTATAGAGA 


TGAATTAGAT 


CAAATGTTTG 


ATGTGAAATA 


7980 




TCAAAGTGAA 


GAACAAATTG 


AACTTGAAGC 


TCAATTGTTT 


GTAGTTCAAA 


TACTATTCCA 


8040 




atatcttttt 


TCTCAAGGTC 


GATTAAAAGA 


TGCAAAGAAT 


TATGTCTTAC 


ATCAACCACA 


8100 


30 


AGAAGTTCAA 


GATCATCGTG 


TAGTACGTAA 


TTTATTGGCA 


ATGTGTTATT 


TATATCTCGG 


8160 




TGAATATGAT 


ACgG CTAAAG 


CATTGTACGA 


aGCACtATTA 


CAAGAGGATA 


GTACaGATAT 


8220 




ATATG CATTA 


TGCCATTATA 


CTTTGCTACT 


TTATAACACT 


AAGGAAAATG 


AACAATATCA 


8280 


35 


AAAATATTTA 


AAAATATTAA 


ACAAAGTTGT 


ACCTATGAAT 


GACGATGAAA 


GTTTTAAATT 


8340 




AGGTATTGTA TTAAGTTATT 


TAAAGCAGTA 


TCGTGCATCA 


CAACAATTGT 


TGTACCCTTT 


8400 


40 


ATATAAAAAA 


GGGAAATTTT 


TATCAATTCA 


AATGTACAAT 


G CTTT AGCAT 


ATAATTATTA 


8460 


TTATTTAGGT 


GAAGAAGACG 


AAAGTCATTA 


CTACTGGGAT 


AAATTGAAGC 


AAATTTCTAA 


8520 




AGTGGAAATT 


GGACATGCGC 


CTTGGGTAAT 


TGAAAATAGC 


AAAGAAGTTT 


TTGACCAACA 


8580 


45 


TATTTTGCCA 


TTACTTCAAA 


GTGATGACAG 


TCATTATCGT 


TTATATGGTA 


utitittatt 


8640 




GGATCAATTA 


AATGGTAAAG 


AAATTGTGAT 


GACGGAAAGT 


ATTTGGCAGG 


TTTTGGAAAA 


8700 




TCTAAATAAT 


TATGAGAAAT 


TGTATTTAAC 


GTATTTAGTT 


CAAGGTTTAA 


CGCTCAATAA 


8760 


50 


ATTAGACTTC 


ATTCATCGCG 


GCTTATTAAC 


G CTTT AC CAT 


AATGAATTAT 


TTGTAAGTGA 


8820 




AAATGATGTA 


ATGGTTG CAT 


GGATTAATCA 


AGGTGAACTC 


ATAATTG CTG 


AAAAAGTAGA 


8880 



SS 
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10 



15 



20 



25 



35 



40 



45 



TCGAAACGTT ACAAAGAAGC AAATTACAAC ATGGTTAQOC ATAACACAAT ATAAACTGAA 9000 

CAAAATGATT GAATTTCTCT TGAGCATATA GATTTATGAA AAGTTAGATT TATTATATAA 9060 

TGCGCATAAT GATTAATAAT GAGGAGGCGT TAATAAAATG ACTGAAATAG ATTTTGATAT 9120 

AGCAATTATC GGTGCAGGTC CAGCTGGTAT GACTGCTGCA GTATACGCAT CACGTGCTAA 9180 

TTTAAAAACA GTTATGATTG AAAGAGGTAT TCCAGGCGGT CAAATGGCTA ATACAGAAGA 9240 

AGTAGAGAAC TTCCCTGGTT TCGAAATGAT TACAGGTCCA GATTTATCTA CAAAAATGTT 9300 

TGAACACGCT AAAAAGTTTG GTGCAGTTTA TCAATATGGA GATATTAAAT CTGTAGAAGA 9360 

TAAAGGCGAA TATAAAGTGA TTAACTTTGG TAATAAAGAA TTAACAGCGA AAGOGGTTAT 9420 

TATTGCTACA GGTGCAGAAT ACAAGAAAAT TGGTGTTCCG GGTGAACAAG AACTTGGTGG 9480 

ACGCGGTGTA AGTTATTGTG CAGTATGTGA TGGTGCATTC TTTAAAAATA AACGCCTATT 9540 

CGTTATCGGT GGTGGTGATT CAGCAGTAGA AGAGGGAACA TTCTTAACTA AATTTGCTGA 9600 

CAAAGTAACA ATCGTTCACC GTCGTGATGA GTTACGTGCA CAGCGTATTT TACAAGATAG 9660 

AGCATTCAAA AATGATAAAA TCGACTTTAT TTGGAGTCAT ACTTTGAAAT CAATTAATGA 9720 

AAAAGACGGC AAAGTGGGTT CTGTGACATT AACGTCTACA AAAGATGGTT CAGAAGAAAC 9780 

ACACGAGGCT GATGGTGTAT TCATCTATAT TGGTATGAAA CCATTAACAG CGCCATTTAA 9840 

AGACTTAGGT ATTACAAATG ATGTTGGTTA TATTGTAACA AAAGATGATA TGACAACATC 9900 

AGTACCAGGT ATTTTTGCAG CAGGAGATGT TCGCGACAAA GGTTTACGCC AAATTGTCAC 9960 

TGCTACTGGC GATGGTAGTA TTGCAGCGCA AAGTGCAGCG GAATATATTG AACATTTAAA 10020 

CGATCAAGCT TAATTCGAAG TCGAATTAAG ATGTTGAGCT CTAAATTATT TGGATATTTA 1008 0 

TTTTAATAGT GTCATCACAG CGTTAAAATA ATGTCTTACT TTTAAATTAA AGCAAATTAT 10140 

ATAG5AAACT AGAACTTAGT ACGTATCATT TGTGCGTTTC AATGAGTTCT AGTTTTTTTA 10200 

TATGTTATAT TAAACTTATA ACTTTATGGG AGTGGGACAG AAATGATAAA GAGCCACTAA 10260 

TGATTTATTA TGTAGTGGTT CTTAAACATT AGCCACAGCT AATGTGTACT TAAAAATAGG 10320 

AATACATGAG TAAAACTCAT GCATAAGAAA TACTAATTTC TATAGAAAAA GTATTACTTT 103 80 

ATCGTTGTCC CACCCCAACT TG CACATT AT TGTAAGCTGA CTTTCCGCCA GCTTCTGTGT 10440 

TGGGGCCCCG CCAACTTGCA CATTATTGTA AGCTGACTTT TCGTCAg CTT CTGTGTTGGG 1050 0 

GCCCCGCCAA CTTGCACATT ATTGTAAGCT GACTTTTCGT CAGCTTCTGT GTTGGGGCCC 10560 

CGCCAACTTG CATTGTCTGT AGAAATTGGG AATCCAATTT CTCTATGTTG GGGCCCACAC 1062 0 

CCCAACTCGC ATTGCCTGTA GAATTTCTTT TCGAAATTCT CTGTGTTGGG GCCCACACCC 106 8 0 
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ACTCGCATTG 


CCTGTAGAAT 


TTCTTTTCGA AATTCTCTGT 


GTTGGGGCCC 


CTGACTAGAG 


10800 




TTGAAAAAAG 


CTTGTTGCAA 


GCGCATTTTC 


ATT CAGTCAA 


CTACTAGCAA 


TATAATATTA 


10860 


5 


TAGACCCTAG 


GACATTGATT 


TATGTCCCAA 


GCTCCTTTTA 


AATGATGTAT 


ATTTTTAGAA 


10920 




ATTTAATCTA 


GACATAGTTG 


GAAATAAATA 


TAAAACATCG 


TTGCTTAATT 


TTGTCATAGA 


10980 


10 


ACATTTAAAT 


TAACATCATG AAATTCGTTT 


TGGCGGTGAA 


AAAATAATGG 


ATAATAATGA 


11040 


AAAAGAAAAA 


AGTAAAAGTG 


AACTATTAGT 


TGTAACAGGT 


TTATCTGGCG 


CAGGTAAATC 


11100 




TTTGGTTATT 


CAATGTTTAG 


AAGACATGGG 


ATATTTTTGT 


GTAGATAATC 


TACCACCAGT 


11160 


15 


GTTATTGCCT 


AAATTTGTAG 


AGTTGATGGA 


ACAAGGAAAT 


CCATCCTTAA 


GAAAAGTGGC 


11220 




AATTGCAATT 


GATTTAAGAG 


GTAAGGAACT 


ATTTAATTCA 


TTAGTTGCAG 


TAGTGGATAA 


11280 




AGTCAAAAGT 


GAAAGTGACG 


TCATCATTGA 


TGTTATGTTT 


TTAGAAGCAA 


GTACTGAAAA 


11340 


20 


ATTAATTTCA 


AGATATAAGG 


AAACGCGTCG 


TGCACATCCT 


TTGATGGAAC 


AAGGTAAAAG 


11400 




ATCGTTAATC 


AATGCAATTA 


ATGATGAGCG 


AGAGCATTTG 


TCTCAAATTA 


GAAGTATAGC 


11460 




TAATTTTUTT 


ATAGATACTA 


CAAAGTTATC 


ACCTAAAGAA 


TTAAAAGAAC 


GCATTCGTCG 


11520 


25 


ATACTATGAA 


GATGAAGAGT 


TTGAAACTTT 


TACAATTAAT 


GTCACAAGTT 


TCGGTTTTAA 


11580 




ACATGGGATT 


CAGATGGATG 


CAGATTTAGT 


ATTTGATGTA 


CGATTTTTAC 


CAAATCCATA 


11640 




TTATGTAGTA 


GATTTAAGAC 


CTTTAACAGG 


ATTAGATAAA 


GACGTTTATA 


ATTATGTTAT 


11700 


30 


GAAATGGAAA 


GAGACGGAGA 


TTTTCTTTGA 


AAAATTAACT 


GATTTGTTAG 


ATTTTATGAT 


11760 




ACCCGGGTAT 


AAAAAAGAAG 


GGAAATCTCA 


ATTAGTAATT 


GCCATCGGTT 


GTACGGGTGG 


11820 


35 


ACAACATCGA 


TCTGTAGCAT 


TAGCAGAACG 


ACTAGGTAAT 


TATCTAAATG 


AAGTATTTGA 


11880 


ATATAATGTT 


TATGTGCATC 


ATAGGGACGC 


ACATATTGAA 


AGTGGCGAGA AAAAATGAGA 


11940 




CAAATAAAAG 


TTGTACTTAT 


CGGTGGTGGC 


ACTGGCTTAT 


CAGTTATGGC 


TAGGGGATTA 


12000 


40 


AGAGAATTCC 


CAATTGATAT 


TACGGCGATT 


GTAACAGTTG 


CTGATAATGG 


TGGGAGTACA 


12060 


GGGAAAATCa 


GAGATGAAAT 


GGATATACCA 


GCACCAGGAG 


ACATCAGAAA 


TGTGATTGCA 


12120 




GCTTTAAGTG 


ATTCTGAGTC 


-AGTTTTAAGC 


CAACTTTTTC 


AGTATCGCTT 


TGAAGAAAAT 


12180 


45 


CAAATTAGCG 


GTCACTCATT 


AGGTAATTTA 


TTAATCGCAG 


GTATGACTAA 


TATTACGAAT 


12240 




GATTTCGGAC 


ATGCCATTAA 


AGCATTAAGT 


AAAATTTTAA 


ATATTAAAGG 


TAGAGTCATT 


12300 




CCATCTACAA 


ATACAAGTGT 


GCAATTAAAT 


GCTGTTATGG 


AAGATGGAGA 


AATTGTTTTT 


12360 


SO 


GGAGAAACAA 


ATATTCCTAA 


AAAACATAAA 


AAAATTGATC 


GTGTGTTTTT 


AGAACCTAAC 


12420 




GATGTGCAAC 


CAATGGAAGA 


AGCAATCGAT 


GCTTTAAGGG 


AAGCAGATTT 


AATCGTTCTT 


12480 
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GCGTTAATTC ATTCTGATGC GCCTAAGCTA TATGTTTCTA ATGTGATGAC GCAACCTGGG 12600 

GAAACAGATG GTTATAGCGT GAAAGATyAT ATCGATGCGA TTCATAGACA AGCTGGACAA 12660 

5 CCGTTTATTG ATTATGTCAT TTGTAGTACA CAAACTTTCA ATGCTCAAGT TTTGAAAAAA 12720 

TATGAAGAAA AACATTCTAA ACCAGTTGAA GTTAATAAGG CTGAACTTGA AAAAGAAAGC 127B0 

ATAAATGTAA AAACATCTTC AAATTTAGTT GAAATTTCTG AAAATCATTT AGTAAGACAT 12840 

10 

. AATACTAAAG TGTTATCGAC AATGATTTAT GACATAGCTT TAGAATTAAT TAGTACTATT 12900 

CCTTTOGTAC CAAGTGATAA ACGTnAATAA TATAGAACGT AATCATATTA TGATATGATA 12960 

ATAGAGCTGT GAAAAAAATG AAnATAGACA GTGGTTCTAA GGTGAATCAT GTTTTAAATA 13020 

15 

AGAAAGGAAT GACTGTACGA TGAGCTTTGC ATCAGAAATG AAAAATGAAT TAACTAGAAT 13080 

AGACGT 13086 
20 (2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1350 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
2S (D) TOPOLOGY; linear 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CATTAGTCAT GAAAATAGCC GACAACTTCA TCTGTGAAAT CACCGGCCTT TTATTTTAGC 60 

TAACTTTATT TCTGATTTTA CGATTTTAAT TGATCATACA GAGAAAGTGA TCTTTTTACA 120 

ATTTCTAAAA ACTCATGATC TATATTGGAC ATTTGATGAA AATAAGACAA AATGTTTTCT 18 0 

GTTAGCTTCT CTTGTTTTGG GAATGAATCA TCTTCTTTAA TCCAAATCGC TAATTCGCCT 24 0 

AATGGTGTTT TATCATCTTT AAATGTTTGT ATATATTCGT AAAAGCTCAT AGTATTCCTT 300 

CTCTCAATTT ACTTATATAA ATCCTACCAC GAAAGCTTTC AAGAAAACAC AATTAAATGT 3 60 

40 

CTATTTAGTG AACTTTTTAA GGTTGTGCAC TCTTTTAATG TCTGCCAATT AGGTCAATTA 4 20 

ATCATCACAA TGTACAATTA ACTCTATTTT CAGTTCATAT ACTCACACAC CGTTTTTGAA 480 

45 CAACACATTA ACTTCTCATT TAGATAAAAC GCAAAAAAGC CTGGCACCAA TACAATAGAT 540 

GCCAGACTAA GAGTCTACTA TATAAATTTA TTTAGCGTAT GGTTTTACTT CGATTGCACC 600 

TTCATTTTCA TCATGAACAC CATGCTTATA ATAATCAATA TATTGTGGCT CTAAAGGCTT 6 60 

SO TCTGCCACGT ATAATGTCTG CTGCTTTTTC AGCTAACATT AAAACAGGTG CGTGTATATT 720 

GCCATTTGTC GTACGTGGCA TAGCTGATGC ATCAACTACA CGTAAATTTT CCATACCGTG 780 
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ACTACAAGAT GGGTGTAATG CTGTTTCACC ATCTCTACGA ACCCAATCAA GAATTTCTTC 900 

GTCTGTTTGC ACTTCTGGTC CTGGTGAAAT TTCTCCACCA TTGAATGGAT CCATTGCTTT 960 

5 

TTGAGATAAG ATATTTCTTG CTACACGAAT TGCTTCTACC CATTCTTTTT TATCTTCTTC 1020 

TGTTGATAAA TAATTAAAGC GGATACTTGG TTTTTCGAAT GGATCTTTAG ATTTGATTTT 1080 

CAAGCTACCA CGAGAGTTTG AATACATTGG TCCTACGTGA ACTTGATAAC CATGTGCGAC 1140 

10 

CGCTGCCTTT TGACCATCAT ATCTTACAGC TATTGGTAAG AAATGGAACA TTAAGTTAGG 1200 

ATAAtCAACT TCGTTATTTG AACGTACAAA TCCGCCACCT TCAAAATGGT TAGATGCTGC 1260 

1S TGCACCTGTA CGTGTGAAAA TCCATTGTAA ACCAATAAAT GGcATGCGCT TGAtATCTAA 1320 

GCTTGGCtGt AATGATACAG GTTCCTTACA 1350 
(2) INFORMATION FOR SEQ ID NO: 18: 

20 (x) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1376 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

25 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
TAATGCTATT GGCAACACCA TATATGAAAn CTCCAAACGA TCCTAAACCG ACTATAGATT 60 

00 

CACCAAATTT nACAATC CAT GAATAAAGTA GTGGCCATAA GAATAACAAT ATGACAACTA 120 
AAAATGTACA GTAAAATGCA GTCATAATTG GAACTAGACG TTTACCACTA AAAAATGATA 180 
ATGCTAATGG TAATTCTGTT TCACTAAACT TATTGTATGC ATAAGCTGCT ATTAAACCTA 240 

35 

TTACAATACC AACAAAGACA TTGC CATTAT TCATCTTTTC AAAAGCTGAA TTTATTTCCG 300 
ArGCTTTCAT TCCTAATAAA GGCGCTAATT TCATTGGTGA TAATACAACT GTAACTAAAA 360 
AATATCCTAA CGTrGCTGCA rGCG SGACTG CACCATCATT TTTCTTTGCC ATTCCTATAG 420 

AO 

CTACACGAAT TGCAAATAAA ATACCTAATT GCTCTAAAAT CGTAGTACCT ACCGTAGTAA 480 
AGAACATTGC GATTTTCGGC GTCGCATGAA GTGCATTTAA CGTATTACCA ATTCCGGCAA 540 

45 TAATTGCTGC AGCCGGTAAA ATGGCAACTG GTAACATTAA CGAACGCCCT AAATTTTGGA 600 

AAAATTTATA CATTGAATGT CATCCTTCTT AAAATAATGT AGAAATATAA AGATTACTAA 660 
TGTAACTAGA ATAACTACTT CGATACTCCG TTATAGTCAC CTAGGCTTAC TAACCAGCTA 720 

60 TATTTCTACC TCAAGTTATT TTATAAACTT TTTACAATTT CATGCAATTC TTGTTGTAAC 780 

TTTGCTGTTC GTGTTTCAAT CTCTTTTGTA ATATAATCGA TACGCTCGTT TCGTTTTAAA 840 

55 
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AAAGACCGTG AATCTTAGTA GGACCAACAT AAGCAACAGG TAATATTGGT GACTTACTTA 960 

ACATTGCAAT TGTTGAAGCA CCaCGTTTCA AAGGTGCACC TTCTTOCOAT GTGCGAGAAC 1020 

5 CTGTTGGGAA GATACCAACT GTCTTATTAT CTTTCAACAA ATTGATTGGG CGTTTTAAAG 1080 

TACTAGGTCC TGGATTTTCA CGATCTACAG GAAATGCATT TAAAGACGTT AAAAATTTAC 1140 

CAATCCATTT ATTTTTGAAT AATTCTTTTT TAGC CAT AT A ATGAATTTGA TTAGGATATA 1200 

10 

ATGCCATACC TAGCATAATG ACTTCGTTAT AACTTTCATG CGTACAAGTT ACGACATATT 1260 

TACTATCCTT AGGAATATTA TCTTTACCOA TTACGTATAA TGATTTTGAC ATTTTAACTA 1320 

AAATGAAATT CAAAATCTTA CTAATCACTG AATACATTGT GCCACCTACT TAACTT 1376 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7363 base pairs 

(B) TYPE: nucleic acid 
<C) STRAND EDNESS : double 
(D) TOPOLOGY: linear 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



30 



35 



40 



45 



60 



TTGTCATACC 


AATATTTTGT 


AAAATATGGA 


ACACAAGTAA 


AGTGACGAAA 


CCAACGATAA 


60 


AGATTTTGTT 


AAATTGATCT 


TCAATTTTCG 


CAGCTAATCT 


TATTAGATGG 


AAGATTAAAA 


120 


ATAAAAATAT 


TAAGATCAAT 


ATGACAGAAC 


CGATAAAGCC 


AAGTTCCTCT 


CCAATCACTG 


180 


AAAAGATAAA 


GTCAGTATGA 


TTTTCAGGTA 


T AT AAACTT C 


ACCGTGATTG 


TATCCTTTAC 


240 


CTAGTAACTG 


TCCAGAACCG 


ATAGCTTTAA 


GTGATTCAGT 


TAAATGaTAG 


CCATCACCAC 


300 


TACTATATGT 


ATAGGGGTCA 


AGCCATGAAT 


TGATTCGTCC 


CATTTGATAC 


AGTTGGaCAC 


360 


CTAAJAAATT 


TTCAATTAAT 


GCGGGTGCAT 


ATAGaATACC 


TAAAATGACT 


GTCATTGCAC 


420 


CAACaATACC 


TGTAATAAAG 


ATAGGTGCTA 


AGATACGCCA 


TGTTATACCA 


CTTACTAACA 


480 


TCACACCTGC 


AATAATAGCA 


GCTAATACTA 


ATGTAGTTCC 


TAGGTCATTT 


TGCAGTAATA 


540 


TTAAAATACT 


TGGTACTAAC 


GAGACACCAA 


TAATTTTGAA 


AAATAATAAC 


AAATCACTTT 


600 


GGAATGATTT 


ATTGAATGTG 


AATTGATTAT 


GTCTAGAAAC 


GACACGCGCT 


AATGCTAAAA 


660 


TTAAAATAAT 


TTTCATGAAT 


TCAGATGGCT 


GAATACTGAT 


AGGGC CAAAC 


GTGTACCAAC 


720 


TTTTGGCACC 


ATTGATAATA 


GGTGTAATAG 


GTGACTCAGG 


AATAACGAGC 


AAGCCTATTA 


780 


ATAATAGACA 


GATTAAGAAA 


TACAATAAAT 


ATGTATAATG 


TTTAATCTTT 


TTAGGTGAAA 


840 


TAAACATGAT 


GATACCTGCA 


AAAATTGCAC 


CTAAAATGTA 


ATAAAAAATT 


TGTCTGATAC 


900 
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TTCCTAAAAC AGCTATAGTG GCTACTAATA 
CGGCTGTTGA CGAGATGAAT AATTCATTGC 
5 TCAATTTTAC ATGACTTTTT AAAAATTAGC 

TTCAATTTGA ATTAGGAATA AAATAGAAGG 
ATAGATACAG ACACATAAGT CCTCGTTTTT 

10 

TATTAAGATT CAAAGATGCG AATAAATCAA 
ATATTAAGGT AGCAAACCCT GATATATCAT 
TTTGTATCGT TTTTGGAGGG AAAAGTGCAG 

75 

ATGTATTAAA TGCAATAGAT AAAGACAAAT 
ATGGTGATTG GAGAAAGCAA AATAATATTA 

20 ATTTAGAAAA TGGAGAGGCG CTTGAGATTT 
AACGATACGA TGCAGTATTC CCATTATTAC 
AAGGGCTTTT TGAAGTTTTG GATGTACCAT 

25 GTTCTATGGA CAAACTTGTA ATGAAACAAT 

CTTATATTAG TTTCTTACGT TCTGAATATG 
TAAATGATAA ATTAAATTAC CCAGTCTTTG 

30 GTATCAGTAA ATGTAATAAT GAAGCGGAAC 
TTGACOGTAA GCTTGTTATA GAACAAGGOG 
TAGGAAATGA CTATCCTGAA GCGACATGGC 

35 

ACGATTACAA ATCAAAATAT AAAGATGGTA 
ACGAAGATGT TCAATTAACG CTTAGAAATA 
GTTCTGGTTT AGTCCGTGCT GATTTCTTTG 

40 

AAACAAATGC AATGCCTGGA TTTACGGCTT 
TGGGCTTATC TTATCCAGAA TTGATTACAA 

4$ AGGATAAACA GAAAAATAAA TACAAAATTG 
TACATTAAAG CAAATTCAAT CATGGATTCC 
AGAGATAAAT GGAGTCACAA TTGATTCACG 

SO ATTTAAAGGT GAAAATGTTG ACGGTCATOG 

TGGGGCTGCT TTTTATCAAA GAGGGACACC 



CCCAGTCTAC TTTGCGAAnC aATGCTTATC 1020 

AAACTCCTTT TATACTCACT AATGTTTATA 1080 

TAGAATATCA CAGTGATATC AGCTATAGAT 114 0 

GAATATTGTT CTGATTATAA ATGAATCAAC 1200 

AAAATGCAAA ATAGCATTAA AATGTGATAC 1260 

TTAACAATAG GACyAAATCA ATATTAATTT 1320 

TGGAGGAAAA CGAAATGACA AAAGAAAATA 1380 

AACACGAAGT ATCGATTCTG ACAGCACAAA 1440 

ATCATGTTGA TATCATTTAT ATTACCAATG 1500 

CAGCTGAAAT TAAATCTACT GATGAGCTTC 1560 

CACAGCTATT GAAAGAAAGT AGTTCAGGAC 1620 

ATGGTCCTAA TGGTGAAGAT GGCACGATTC 1680 

ATGTAGGAAA TGGTGTATTG TCAGCTGCAA 1740 

TATTTGAACA TCGAGGGTTA CCACAGTTAC 1800 

AAAAATATGA ACATAACATT TTAAAATTAG 1860 

TTAAACCTGC TAACTTAGGG TCAAGTGTAG 1920 

TTAAAGAAGG TATTAAAGAA GCATTCCAAT 1980 

TTAACGCACG TGAAATTGAA GTAG CAGTTT 2040 

CAGGTGAAGT CGTAAAAGAT GTCGCGTTTT 2100 

AGGTTCAATT ACAAATTCCA GCTGACTTAG 2160 

TGGCATTAGA GGCATTCAAA GCGACAGATT 2220 

TAACAGAAGA CAACCAAATA TATATTAATG 2280 

TCAGTATGTA TCCAAAGTTA TGGGAAAATA 2340 

AACTTATCGA GCTTGCTAAA GAACGTCACC 2400 

ACTAACTGAG GTTGTTATTA TGATTAATGT 2460 

TTGTGAAATT GAAGATCAAT TTTTAAATCA 2520 

AGCAATTTCT AAAAATATGT TATTTATACC 2580 

CTTTGTCTCT AAAGCATTAC AAGATGGTGC 2 64 0 

TATAGATGAA AATGTAAGCG GGCCTATTAT 2700 
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AAACCCTAAA GTAATTGCOG TCACAGGGTC TAATGGTAAA ACAACGACTA AAGATATGAT 2820 

TGAAAGTGTA TTGCATACCG AATTTAAAGT TAAGAAAACG CAAGGTAATT ACAATAATGA 2880 

5 AATTGGTTTA CCTTTAACTA TTTTGGAATT AGATAATGAT ACTGAAATAT CAATATTGGA 2940 

GATGGGGATG TCAGGTTTCC ATGAAATTGA ATTTCTGTCA AACCTCGCTC AACCAGATAT 3000 

TGCAGTTATA ACTAATATTG GTGAGTCACA TATGCAAGAT TTAGGTTCGC GCGAGGGGAT 3060 

10 

TGCTAAAGCT AAATCTGAAA TTACAATAGG TCTAAAAGAT AATGGTACGT TTATATATGA 3120 

TGGCGATGAA CCATTATTGA AACCACATGT TAAAGAAGTT GAAAATGCAA AATGTATTAG 3180 

15 TATTGGTGTT G CT ACTGAT A ATGCATTAGT TTGTTCTGTT GATGATAGAG ATACTACAGG 3240 

TATTTCATTT ACGATTAATA ATAAAGAACA TTACGATCTG CCAATATTAG GAAAGCATAA 3300 

TATGAAAAAT GCGACGATTG CCATTGCGGT TGGTCATGAA TTAGGTTTGA CATATAACAC 3360 

20 AATCTATCAA AATTTAAAAA ATGTCAGCTT AACTGGTATG CGTATGGAAC AACATACATT 3420 

AGAAAATGAT ATTACTGTGA TAAATGATGC CTATAATGCA AGTCCTACAA GTATGAGAGC 3480 

AGCTATTGAT ACACTGAGTA CTTTGACAGG GCGTCGCATT CTAATTTTAG GAGATGTTTT 3540 

25 AGAATTAGGT GAAAATAGCA AAGAAATGCA TATCGGTGTA GGTAATTATT TAGAAGAAAA 3600 

GCATATAGAT GTGTTGTATA CGTTTGGTAA TGAAGCGAAG TATATTTATG ATTCGGGCCA 3660 

GCAACATGTC GAAAAAGCAC AACACTTCAA TTCTAAAGAC GATATGATAG AAGTTTTAAT 3720 

3D 

AAACGATTTA AAAGCGCATG ACCGTGTATT AGTTAAAGGA TCACGTGGTA TGAAATTAGA 378 0 

AGAAGTGGTA AATGCTTTAA TTTCATAGAG ATTAGTCGAG GGACCTTTTA CTTATAAAAA 384 0 

TGATTTGAAT TAATACTAAA AGATTACAAA GAAGAGGTGG TTTTGTGTGT AAATACAAAA 3 900 

35 

TTGCCTTTTT CTTTTTATGT TAAATCTATA AATTTGAAAC TAAATGAAGG TT AATT CT AT 3960 

GTACACACTT TATATAGGAA GTAGTTTGAA TGTTTATATA ATGTTTTACA AAAAGATGTA 4020 

GTATTATAAT GTCTAATTTC ACATGTGTTT CAGTAAAATT TGTTGTGGAA TGTTAACGAT 4 08 0 

40 

ATACGTATTT TATAAAAaAT TTTTTATAAT GATTATTCGA ATGATGCGTA ACG CTTACAT 414 0 

CTTATCTAAT GCTAGCTTTT TGACAAAAAT ATGACAATCA ATTAATGTGA TTCTAATAAA 4200 

45 TATTCGCAAA TTGCTTTATT GCGATTAAAT TTTTTTGGTG GTACTATATA GAAGTTGATG 4260 

AAATATTAAT GAACTTATAT GCAAAAGTAT ATTGAGAAAT AAACAGGTAA AAAGGAGAAT 432 0 

TATTTTGCAA AATTTTAAAG AACTAGGGAT TTCGGATAAT ACGGTTCAGT CACTTGAATC 438 0 

50 AATGGGATTT AAAGAGCCGA CACCTATCCA AAAAGACAGT ATCCCTTATG CGTTACAAGG 444 0 

AATTGATATC CTTGGGCAAG CTCAAACCGG T ACAGG T AAA ACAGGAGCAT TCGGTATTCC 4 500 
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AGAATTGGCA 


ATGCAGGTAG 


CTGAACAATT 


AAGAGAATTT 


AGCCGTGGAC 


AAGGTGTCCA 


4620 




AGTTGTTACT 


GTATTCGGTG 


GTATGCCTAT 


CGAACGCCAA 


ATTAAAGCCT 


TGAAAAAAGG 


4680 


£ 


CCCACAAATC 


GTAGTCGGAA 


CACCTGGGCG 


TGTTATCGAC 


CATTTAAATC 


GTCGCACATT 


4740 




AAAAACGGAC 


GGAATTCATA 


CTTTGATTTT 


AGATGAAGCT 


GATGAAATGA 


TGAATATGGG 


4800 


10 


ATTCATCGAT 


GATATGAGAT 


TTATTATGGA 


TAAAATTCCA 


GCAGTACAAC 


GTCAAACAAT 


4860 


GTTGTTCTCA 


GCTACAATGC 


CTAAAGCAAT 


CCAAGCTTTA 


GTACAACAAT 


TTATGAAATC 


4920 




ACCAAAAATC 


ATTAAGACAA 


TGAATAATGA 


AATGTCTGAT 


CCACAAATCG 


AAGAATTCTA 


49B0 


15 


TACAATTGTT 


AAAGAATTAG 


AGAAATTTGA 


TACATTTACA 


AATTTCCTAG 


ATGTTCATCA 


5040 




ACCTGAATTA 


GCAATCGTAT 


TCGGACGTAC 


AAAACGTCGT 


GTTGATGAAT 


TAACAAGTGC 


5100 




TTTGATTTCT 


AAAGGATATA 


AAGCTGAAGG 


TTTACATGGT 


GATATTACAC 


AAGCGAAACg 


5160 


20 


TTtAGAAGTA 


TTanAGAAAT 


TTAAAAATGA 


CCAAATTAAT 


ATTTTAGTCG 


CTACTGATGT 


5220 




AGCAGCaAGA 


GGACTAGATA 


TTTCTGGTGT 


GAGTCATGTT 


TATAACTTTG 


ATATACCTCA 


5280 




AGATACTGAA 


AGCTATACAC 


ACCGTATTGG 


TCGTACGGGT 


CGTGrCTGGTA 


AAGAAGGTAT 


5340 


25 


CGCTGTAACG 


TTTGTTAATC 


CAATCGAAAT 


GGATTATATC 


AGACAAATTG 


AAGATGCAAA 


5400 




CGGTAGAAAA ATGAGTGCAy TcGTCCACCA CATCGTAAAG AAGTACTTCA 


AGCACGTGAA 


5460 




GATGACATCA 


AAGAAAAAGT 


TGAAAACTGG ATGTCTAAAQ AGTCAGAATC 


ACGCTTGAAA 


5520 


30 


CGCATTTCTA 


CAGAGTTGTT 


AAATGAATAT 


AACGATGTTG 


ATTTAGTTG C 


TGCACTTTTA 


5580 




CAAGAGTTAG 


TAGAAGCAAA 


CGATGAAGTT 


GAAGTTCAAT 


TAACTTTTGA 


AAAACCATTA 


5640 


35 


TCTCGCAAAG 


GCCGTAACGG 


TAAACCAAGT 


GGTTCTCGTA 


ACAGAAATAG 


TAAGCGTGGT 


5700 


.. AATCCTAAAT 


TTGACAGTAA 


GAGTAAACGT 


TCAAAAGGAT 


ACTCAAGTAA 


GAAGAAAAGT 


5760 




ACAAAAAAAT 


TCGACCGTAA 


AGAGAAGAGC 


AGCGGTGGAA 


GCAGACCTAT 


GAAAGGTCGC 


5820 


40 


ACATTTGCTG 


ACCATCAAAA 


ATAATTTATA 


GATTAAGAGC 


TTAAAGATGT 


AATGTCTTGA 


5860 


GCTCTTTTTT 


GTTTTCAATA 


ATTGATTCTC 


TGTAGATATC 


aAAGTaCTAA 


CGTTTTAAAG 


5940 




GTTAAATATT 


TAATTGGATT 


GAGATCTGTA 


TGCGGTTATA 


TCaTTCTGTG 


TAAATATGGT 


6000 


45 


TCTCCACCAA 


ATGTGGTGAG 


TATATAATTT 


AAAGAACTAT 


TTTTAAATTA 


AGAATAATCG 


6060 




AACATAAATA 


AACTTTATGA 


AATTTCAGTA 


TCATGTTCTT 


ATAAAAAACA 


ATAGGGCTTT 


6120 




TTGctGACGC 


TAGTG CGCGA 


TAAATAATAA 


GTTGAATATA 


AAAAAGATCA 


CTGCCAATCA 


6180 


SO 


TTCGTTTAAT 


GGCAGCGATC 


ttttttattt 


AATTATTTCT 


CTTTCCACTG 


CAACATTTGA 


6240 




TAACCAATGC 


GTGGATGTGT 


TTTAATAATA 


TCTTTTGCGT 


CCTCATGACA 


TTGTGAAAGT 


6300 
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CCATATATTC GTTTTAATAT CATCTCATAA GTGAQTACTT TTCCTTTATQ ATTTQACAAT €420 

AGTTCTAACA AGCTAAATTC ATTTGGCGTC AAATGTACCT CCTGATTATT AATAACAACA €480 

S 

GATTTGGAGC CAAAGTCGAT GCTTAGCAAA CCGTTAGTAA ATACAATGTT AGTTTCTTGA €540 

TGTGACTTAG CGATTCTCTC GATGACTCGT ATTCGTGCCC GAAGCTCATC AACATTAAAA €600 

GGTTTAGTCA TATAGTCATT CGCACCGTTA TCTAAAGCTT GAATAATTGT TTGTTCTTCT 6660 

10 

TGTCTTGCAC TTATTACAAT GATAGGAATG TCAGTATGTT GCCTGATTTC TGAAATCAAA 6720 

CATAATCCAT CTTTATCTGG TAAACCTAAA TCTAATAAAA TGACATCTGG TTTATCAATT €780 

1S TGAATTTTAA AGTGTGCTTG TGTGGCATTG TCGQCTGTAG TTACATTGTA ATAATCTAAA 684 0 

GTTAATGCAA CATCAAGTAA ATGTGTGATT GCGTGATCAT CTTCAATTAT CAATATTTTA 6900 

GATTGCATTA TACGTCTCCT TCGTTAAAGT CTOTATATAT ATTGAAATAG AATATACTGC 6960 

20 CGTGTGGTTG GTTCGGTTTA TATTGTAAGT TTGATTGATG TTTGTGTAGG ATAGTCTGTA 7020 

CTAAATATAA GCCTAGTCCC ATGCTTTCTT TTTGGTTATC TTTAAAATAT TTATTTGATC 7080 

CTGTGTAAAA AGGCTCGAAT ATCTTTTGTt GTTCTTCTAA ACTAATTCCA GGTCCTTCGT 714 0 

25 CTATAACGGC AAATTCGATT TGTTCATAGC TAG CATAACG AATAGATAAA TTGATTTTGG 7200 

TGTCAGTAGA AGTGTGTTTA ACTGCATTTT CAATCAAATT GAAtAAAgCT TGTAAAATCA 7260 

ACTTACTGTC AATGTGTATA AACtGTAAAT TTACTGAGGA TGATACAGTT ATACGCTTTT 7320 

30 TTAAATGGCG ACGTTCTAAA ATACATATCG ATTTCTTATA CTA 7363 

(2) INFORMATION FOR SEQ ID NO: 20: 

<i) SEQUENCE CHARACTERISTICS: 
35 <A) LENGTH: 104 70 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
- (D) TOPOLOGY: linear 

40 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20; 



TTAACAATCG ATAACCACAA TACTTCTATT GTAATTGTTT AACGATTTCn CGATTAAAAT €0 

CATCTAAATC GTCTGGTACT CGACTTGTTA CAATATTGTT GTCTACAcTa CTGACTCATC 120 

AACTACATGT GCGCCTGCAT TTGATAAATC TTTGCGTACA TTTAATACTG CTGTTAACGT 180 

ACGACCTTTT AAATCGTCTG TATCTATTAG TATTTGTGGC CCATGACAAA TGGCAAATGT 24 0 

TGGTACATCA TTTTTAGTAA AGTATTTAGC AAATGTGCCA TATCGACCTT CTGTATCTCC 300 

ACGTAAATGA TCTGGTGAAA ATC CTCCAGG AATTAATAAT GCATCATAAT CTTCTGGTTT 36 0 



55 



271 



EP0 786 519 A2 



ATTTGCAGTA TCTCCAATCA CTACAGTATT AAAGCCTGCA TTCTCTAATO CCTCTTTAGQ 480 

GCTTGAATAT TCTATATCTT CAAATTCGTT TGCTAGAATA ATTGCTACTT TTTTAGTCAT 540 

5 

TGAAAATCAC CTTTCTATAT ATCATTGATA TAATTACTAT AGACAAGTAA ATCAGTGATT 600 

AAACATACAA GATATAAAAA AT ATT AAG CG ACTGTCGCGA TATCTAACCC TAACACATCT 660 

TATGTGGCAT TTACTTAGAT ACTAATTTAA CCTTTTCTTC AAGCTGATCT AACAATCCAA 720 

10 

TCCATTCATC TATATCTTCA ACACGTACTT CATCAGGATT TACATGATCG ATATCCTCAA 780 

TAAACTTATT TAAACGCGCT TTTATCTGTT CGATTGTTTG CTGTTCATTC ATAAAAAGTT 040 

1S AACTCCTTTT ATTTTGTTTT CTTTTTCATT ATTATCCTAA CAGAAATTGC GTTAAAGCGA 900 

TATAATCTTA GCTATATTTA TGACATTCAA ATTATTTTGA CTTTTAAAAA TCCCCTTTTC 960 

AATTAACTAA AATTAAGAGA TAATTTGTTA CGAGTGATAA TACGAaGkGG TaTCATACCG 1020 

20 ATATGAACCA AATAGAAAGA AGGAAGTTTA AGACGATGAA TAGCGTCAAA TTGAAGCAAC 1080 

CTGTTAGCAT TTACAATOAT CCATGGGAAQ TGAAATTTAT ATACATTTAA ATTTCATGAG 1140 

ACAATAAACG TTGATTTAAT GCGTTTTTTT GCCTTTTTTA TTTTCCTTAT TTTTTCTGTT 1200 

25 TTACAACAAA ATGGTATCAA AAATGGTATC ATTTGTAGTT ATTTTAGCTT CACATATTAA 1260 

AACAACCACA CTCCTAAATT AATAGGTGGT GTCGTTTTGT TGGTTGTGTG GGGATAAAAA 1320 

TAACCGCATC AGTTAAGATG CGGTTATCTA GCAAGGGCCA CGTATTTATA AATACGTTTA 1380 

30 GAATCTCTTC GGCAACTTTG CTATAGACAG TCTATGCTGT TACTAAATTA TACCACCACA 144 0 

CAAACCTACT CCCATTCAGG AAGACAGAGC TTTGTCGCTC GTCAGCAACG TCATATGAAT 1500 

TCTCAGTTCA TGTTGTGGTG ACACTTTAAA CGGTCTGTGC CAGTAGCGAC CGAGTCATTT 1560 

35 

CAAGAATGAC CATTTCACAT TTATATTATA ACACTTGTCG TGCGTAACTG TATAGTTTTT 1620 

CAGfTGTATT TAAAGTTAAG TTATCTACTT CGCGCTTTCC TTGCCTTAAT TGTGAAATTA 1680 

CATATTG CGC TACGCCAGTT TGTTTGTGAA TTTGGTAACC TGTTATATCA CTTTTGATCA 1740 

40 

ATTCAATTAT TTTTAATTTA TAATCACTCA TATTATCTAC GTCCATTCTT TTTATCTAAA 1800 

CAATAAAAAT GTGTCTTTCT CCCGATAAAT AATAACAATG GTAGGCTTAA TAAAAACAAT 1860 

4$ ATTAAATACA TTTGTTCTGT CATAATTGAA AACCTCCAAA TAATATTATA TTATATAAGT 1920 

GTAAGGAGGA GCCATCAGGC TCCAAGCATA ATGTTAATCT TTGTTGTTTG GCTTTCGGTC 1980 

TAGGTAGCCG AGATGCCaTT CTCTAAGTTG TTTTAACACT TCTGGAATTA TCAGTACTGC 2040 

SO CAATACTTGA TGTTCTAGAA GTGTTTTTAT TATGTCTAGC ATGAGGCTTT TCACCTCCTT 2100 

ACACATAATT TGTAAGTCAT CAACTAACCT ACAAATATAA TTATACTAAA CAAATGTTTA 2160 
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GTTATCTACA TTTAAATCTT GAGAGAAATG TTAAAAAGTT CTAGTAAAAT AATAGCACAT 2280 

TTTATCTTTA AATGTAAATA GAAAGCAGGT ATOTAACOCA CCTGCTTAAA TAGaCATGAC 2340 

5 

TATGTCATTC TAACTGATTT CTCCCCATAA GTCACCTAAT ATCTGATTAG GTGGGGCAGA 24 00 

ACCATTCCAT GTTCTAATAG GCAAGTAATA ACGTTGCCCC TCCCATGTAT ATCCTACCCA 2460 

AACATGACCA TCTTGTAACA TCACTTCTGT ATAATCACAA TACCCACCAG GTTGGAACTG 2520 

10 

ATAACCCACT GGACAAGATA AGAATGGCCC CACTTTTCTT ACTGTGATTG GTTGATTGCC 2580 

GTTTGTGAAT CTAGCACTTT CTTCCATGTA GTAAGTACCA TATTTATTAC GTTTCCATGC 2640 

1S ACTTGCAACT GGTTTAACTG TATTACTTGA AGCGCTTGAC TCATTAGAGA CAGTGGCAAC 2700 

OGGTATTTTA CCATCCATGT ACGCCCTAAT CTGCTTGATA AAGTAGTCTT TAAGTTGCAA 2760 

CCGCTTGTCT TCTGGCAATA GACCGCGAGT TACTGGGTCA AAACCAGTGT GTAAAACCGA 2820 

20 ACTTCTATGA GGGCATGATG TTGAAGTAAA TTCATTGTGC AATCTGATTG TATTTCTGTT 2B80 

TGCTGGTAAT CCCCATTTTT TCAACAATCT AGCGCATTCT TGGAAAGTTG CCTGTTCATT 2940 

TTTTAAGAAT GTCGCGTTAT CTGCGCCCAT TGATTGACAT ACTTCAATAC CGTAATAATA 3000 

25 TTTATTACCT ATTTGATTAG CGGTATGCCA ACCTACTTGT GATTCATCTA AGGCTTGCCA 3060 

AACTGTGTTG CCTGATACGT AACTATGCGC AATGCCCGCT TCTAATCTTG ATAAAGGTGC 3120 

ATTTACTAAT CCGTTACGAT ATGCTTCAGC AGTCGCCCCT TTGCTCCCTG CGTCGTTGTG 3180 

30 

TATAACTATA CCTTTAGGGT TACTACCACG CTTAGGTAGG TCATAACCTT TAACCACATC 3240 

TTTGATGATT TTAAGTTCTA CTGCTTTAGG TTGTGGCTTA GCTGTTTCTT TTTTAGGTGC 3300 

TTGTGTAGGA GATTGAACTG ATCGTGGCGC TGTCTCACTT TTAAAATTCG GACGGATAAA 3360 

35 

CCACATAGGG AAATCATAAG CATGTTGTCG TCTTGTAACT TTTTCCCAAC CCCAGCCGGG 3420 

TTGTTCGATT CCGTCAGTCC AGCCACCGCC TAGCCAATTC TGCTCATATA CAATGATGTA 34 80 

ATCTAAAGTT GCTTCAATTA CCCATGCAAC GTGACCATAT CCAGCACCGT AGTTGCTACC 3540 

40 

GAATACCACC ATGTCGCCAG GTTGTGCTAA GAAGTCCGGT GTATTTTGGT ATACAGTAGC 3600 

TAATCCGTCG AAGTTGTTAG CGAACGGAAT ATCTTTTGCA CCTAAACCTT TTAGAAGTAA 3660 

4S TCCAAACAAA ACTTTCCAAC CAGCATTGGC ATAATCAAAG CATTGAAATC CATACCATAA 3720 

GTCCACATTG AATTGTTTTC CCTCAGAAGT TTTCAACCAC TCTATAAACT CATTTTTAGT 3780 

TAATTTTGCT TGCATTGTCG CCACCTCCAT GATCATACTC ATTCACATCA AAGCCAACAT 384 0 

50 CGTTAGAGGC GTCTGTGAAA GGTTGTGATG TATCATATTC TTTTGGTGcT TTCGCGCTTA 3900 

ATTCCGGCGT TAAACTACTG TCTTGTGATG ATTTCCACGT AACTTGTTGT TCTTCTTTTT 3960 
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TTGGGTCAGT AATAACGCCA ATACCTGTAA 
TAGCTTGATT TAATTGAGTA GATAAATCTA 
5 ATAGCAACAA TGCTCCAACT AAACCAGTTA 

AGTTAATATC CATTTGTTTG CTCCTTTTAT 
GTTTAAAATT ATTCAATGGT CAATGTCGGA 

10 

TACAACATCC CTGAAGGATT ACTAAAGTTG 
CCTGATATTC CTAAATCACT TGACCCTAAA 
CGTACATTTT CTATTGTCAC CTGATAACTT 

15 

TTTGCTGTTG ATTCTTCTAT GCTATCCGGA 
AAAGATGAAG GCTTTTTCCA TACTTGGATA 

20 CCTTCGTAAA TAAACTTCTT TACATTTTTA 

TTAAATATAA CGTATTCGGG TCTTTTTGAT 
TCCAAATTTT AACCGTCGGT TGAGATGCGC 

25 GTTTAGTAAA AGCTTGAGAT GACAAAACAT 

CATATAAATC ATTTAGTGTT TGTTTGAATT 
CAATCTGTTG CAATACACTT TCTGAAATAG 

30 TTAATGTGTT CATAGATTCA GGCGCGCTAT 

ACGTTTTAGA GTCGTTGAGA GTTGTATCTT 

CTAACCCTTC AACATTTGCG ATATTGATTT 

35 

CTGTATCTTT ACCATCAATT TGCCACATTT 

CGTTTTTACC GGGTGCGCCT TGTTCTCCTT 

GTCCCGGTTC ACCTTTATCA CCTTTCGCAC 

40 

AAGAAATGAC ATCTTTATCT ATTTTCTCTT 
TATCTTTTAA AATTCTCGTA ATAGCATCAT 
CAGCAGATTC AATACCACTA TCAACGATAT 

45 

CTTCTTCTTT CTCTAAAAAC AGCTTACAGC 
TAGGTATCTT GTAGGTAAGG AAACCTTTTA 
50 TGAATATAGA GCCATCTTCC ATAAACAAAT 

GATCGATACG ACCTTGTTTG TCATTGATAC 



GTAACGTGAG GATAGCGCCT ATAATTGCGC 4080 

ATCCGAATAA ATCCGTGACT TGCTTGATAA 4140 

GTACTGCTTT GTTTTTGAAT CTCAATTTCC 4200 

CCAAAATAAA AAAACGACTA AAAATTAGTC 4260 

GATCCTGAAT AAACATCACT TATAGTGACG 4320 

ATATTTTTAC TTGCAACTCC GCTATTGACT 4380 

TTAGTTTGCG AAATCCTCAT TATACCGCTA 4440 

TTATTGGGTT CAACTCCATT TATTGT CCAT 4500 

TATTTATTTT TAGGTAAGGG TTTTATTACA 4560 

TTTCCAGCAT ATACTTTTGT ATATTCTTCA 4620 

AAATTACCTT CCATAAAAAT CACCCTTTAA 4680 

ATATATAGTT ATATTCATTT TCTGTTCCTG 4740 

TTTTTAGTTG ATATAAATTA TCCGCTTGTT 4 800 

ACCGCTCGTC ATGATTATGA TTTTTTGGAG 4 860 

CCTCAAAATC TTCTGTATTA ACTTTTGAGC 4 920 

AGTTGTTTTG TATTGCTTCT GCTAATTCTC 4980 

CAACTAGTTC AGCAATTTTT GTATCOGTAT 5040 

TGATTTTTTC AACTTCTTGC AATTTATTTT 5100 

TGTCCAATAA CTCAGGTTCT GCTTTGATAT 5160 

TAGTGTCAGG ATTGATTGAT ACTACAGTAC 5220 

TTTTACCTGC TTCACCTTTT GCTCCAGGTT 5280 

CTTTAAATCT ACTTTCATTC TTTTCGATGT 5340 

TAAAGTCTTT GCTCAATAAA TCTGTCGCGT 5400 

CTACCAATTT AACATCGATT TCTTTTGCTA 54 60 

TGAAAGAAAA GTTTGCGACA TGTATTTTTT 5520 

GAACATAACC AGCGTGTTTG ATAACCTTTT 5580 

CAACATCGTC GATAATAAGG GG CTCATTTT 5640 

GTAATCTAGG TGTTAAGCCA TGTGCTTTTA 5700 

CTATTCTTAT AGATGCTGTA TTTTCATCTT 5760 
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CAACATCTTT TATTTTGTAC ATTTACACAC CTCTTTATTT ATATTTATCC CTTGTGAAGT 5880 

AGATACCTTT TAAGCCGATT TGTTTATATA ACTTAGCOAT TGTACTTGCT TGATGTTGGC 5940 

5 

ACCACTCTAT AGCAGTAGCG TATTGGTGGG TAGCTGGATT CTTAGGATTC CATCTAATTC 6000 

GGTACAATGT GTTTTGACCT TTATTGATGT AATCCTTTCT TACGAAGCTA GCACCGCCCA 6060 

TGATTGCTTT TGCTGGAGAT GTCCAACCTT TATTCCTTGC AAACGTCATT GCGTAGTTAG 6120 

10 

GATTGTTGTC GTAAGCGCCA ATGCCGAAGT AGTTGTATAC TCCATCTTTT CCGTTAGCGA 6180 

AGTTACTTGT TCCATATCCA CTTTCTAAGA AAGCATGCGC GATTAAATAA ATTTCATTAA 6240 

T5 TGTTdTGCTT TTTACAAGCT TCTGCGAACG CTTTACCTTG ATTATTCAAT GTTCCCTTAC 6300 

CTTTAAGTAT CTTATTAAGT GCGCTAACTG AAACACCTTG ATACTTGCCT AAATTAAGCA 6360 

TTTGGTAGCA TTGTGTGTTA CTTTCCCATA TACGCTTTAC ATTCATTGCT GAACTCGTTT 6420 

20 GTGCTCGTGT AGCGTTAscC AACCCCAAGC ATTAGATTTT TTCGGGTTAC CTCTTGCCAT 6480 

TTGTTTATCC AGTGCTTGTT TGAATGTATA AGGACTOGTT TCTGTTATGA TCTGCGGTTG €540 

TTTAGATGCC GAACCATTGT TGGCTGTTGG TGACGAGTCT CTTACATTAG CTATATCAGC 6600 

25 GTTTTTATTA TCTACCATAA CTTTTATTCT AGATTTTGTT ACTGTTGGCT TAGTTATAGA 6660 

ATTTAATAAT TTTTCTCTGT TTTTAAATAT ATTAAGTAAT GCCTTTTCTA ATGCTTCGTA 6720 

TTTATCTTTA GGAGGAACAC CGTTGTCAAT CATATTCCAA TTAACATGTT CCAACATTGA 6780 

30 ACGCCAAATG CTGTCGTCTA CTTTTAAATT TTCAATACTT AGAGGTATCT CATATTTGGC 6840 

CATCATATCT ACAGCTACAA CCATTGCGTG AATCT CATTA AAAATAAATT CATTTTTACT 6900 

CGCACTATAA TCTTCACATA CGTCTATAAC TATATAATCA GGTTCATTAG GAACTTCAAA 6960 

OS 

TACAGCTCTT CTAGGTGCCC AAATATTATG TCTATCAACA TAAAAGTGGG GATATTCTAC 7020 

ATCCTGTTTG TATTTCTTCC TACTGTTATA TAAACTTTCT ACCGAGCTCA TCGTTTGTGC 7080 

GTTTCTAATC ATTATTCCTT TAGGTTTTTC GAGTCGTCGA TTACCTTCTA CTATAAAGTG 7140 

40 

ATAAATATAT TCTGGATAAT TAACCTCTTG GCTAGAAATA GTGTACTTTA TAGTTGTTAC 7200 

ATCTTTCCAA ATTGGAACTT TTTTATTATT TTTTTCGTTA TCATCACTAT CATCTTCTGG 7260 

4S TTTAGGTGCC GGTGTAGTTT TGTCTGGATG ATATGGTGGT CTAACAAAAT ATTTAACCCC 7320 

TCCACCTGGT C CATCATGAT AAGAGTGTTT AATTTTATAA GGTGGACTTC CTGTTGCGTT 73 80 

ATTTGTATAC CAGTTTTGAT CTACGCCATA CCAATAGTCT TTTGTGCATG GTCCCACTAC 744 0 

60 AATGTTTACA TGTCCTGCCC AACCAC CAGT CCAAACACCC CAGTCGCCTG GTTGTGGTAC 7500 

AAAATCTTTT GTATTTCTAA TTATCTTGAA ATCTCTACCT CTATAATTGG ATTTTTGAGC 7560 
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TAAATCCCAG CATTGTGCTC CCATTCCAGA ACCAGGTACA TCAAT AG CT A TTTTGTTTTT 7680 

AGCGATATAT AACGCCCATT CAACCACTTC ACTAGCTGTG GGCTTTCTAT TTTTCGGATT 7740 

5 AGGTAATCCC ATGTATGCAC CTCATTTCAA TCAAAATAAA AAGCCAGTGC CGAAGCACTG 7800 

ACTCTTAACT GTTATTTACA TTTACCAAAC CAGAAGCACG CCCAQAAGCT AT ATC CT AAA 7860 

ATCCCTTTAA GCATGGTAAT CACCTCCTTT AAATACCAAA AACAGTTCTT AGTAAAG CTA 7920 

10 

TGACAATCGT ACTGAAGATA GTCCCTATCA AACCTAGAAT CCACATTTTT ATGTCTCTAA 7980 

TATTCTTGGC ATTCTTTTCT TTATTCTTTT CATCTTCTAC CTTGTCGCGC TTTAATTCTT 8 040 

CAAAATTTCT ATCTAATTTG TCATAAATCT TTTCTTGCGC TCTAAGACTA TCTTCTATTC 8100 

1S 

TGTCGAATTT TTCAAACATA GTCTTATCAT TTTCTTCTAA TCGCGTTAAA CGCCAATCTT 8160 

GTTCATGTCG TTTGGTAAAT CCAAACATTA TGCCACCCAC TTTATTCAAA TTAAAAAGCC 8220 
ACAAGCATTA CACCTGTGAC TTTTCATCTT TTGTTTCTGG ATATTTTTCT CCAGTGATTA - 8280 

20 

AAGCGTATTC TTCTTTATCG ATTAAACCCT TGTCTACGTA CCACTTAATT TGCTCGTTTT 8340 

TATAGTAACC CCAAACATAA AAAGTTTTAA TGTCTTTAAA AOTTGOATAA ATCATCTTCA 8400 

25 TTATTTAAAC GTCCCCCTCA GTACTTGTTT TGTTAGTTTT CAGTTCAGTC AACTGTTGTG 8460 

TTAACATAGC GTTTTGTTGA GCTAATTCCA TTGTTAATAC GTTTACTTGT GCCACCTGCA 8520 

TTTGCATACT CGCAACCATT CCGCGAAGTT CCTCATCACT TAAATCTGAC GCACTTTGTT 8580 

30 GGTTTGATGC ATTCGGTACG TCTTCTTTTT CGAAATTGCT ATTGTATTTA " ATTTCGCCGT 8640 

TAGTGAAAAC AAACTTTCTA GGTTCGAACT CTTCTTTAAA TTTAATAGGC ACATTGTTAT 8700 

CATCTACATC TAAACTATTG CGTAAACCGC CAGTATTAAC GAATCCGATA ACTTCGTTTT 8760 

35 TATCGTTTAC TGTGATTTTC ATTATTTCCA CCC CATAATT TT AG TT AT AG TAACTTTGTT 8820 

GGCATTCGCT CCAGAACCTG ATGTTTTACC TAAATCAAAG TACACATCGT TATCTATTCT 8880 

TAAAGTAGTG CTACTTGTTT TGGATAGTAA GCACTCATAA ATACCGCCAC CGTTG CCGTC 894 0 

40 

TGAGTCAACT ACATTCGCTT TACTCAATTG AATCGCGTTA GGTAATGCGG TTAGTCCGAA 9000 

TCCCTCAATA ACGCCACCTG GATAAGTTCC ACTTACCAAC AAAATAGAAT AGTTTGTGTA 9060 

CGGTTCAGTT AGATTGATTG TTGTACCTAC ACCATTTGCG CCACCGTCGA ACAATACCGT 9120 

45 

TGATTTATGT TCATTAGGAA CTGTCCACTG TTGCTCAAGT CTGCCGTTTG TGATTGATCG 9180 

TGTGTAAATC TTTTTAGAGT TATAAGGTGT GAAGTTAAAT AGCTTGTTTG TATCATCTTT 9240 

AACGAATACC GATAAATAAC CCTCATAACT TTCAACGCTA CCTGGTAAAT CCGGCACTCT 9300 

TGTTGCATAG TAATTACCAG CAGTTAAATA TCCCAAATCG CCTTGCGCAT TATTTAAGTT 9360 
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10 



1S 



20 



25 



30 



35 



GAATTTATCA TCTACATACT GCTTAGCTTG ATTTAAAGCG TTGTTAGACG TTTCTTCAAC 94 80 

AAATTGCTTA GTTAAGTTTC CATCATTCTT TTTATAAAAC GGGTACCATG TGCCGTAGAT 9540 

TTTGTATTTT GTGTACTCAT CGTTTGAATC GTCTGGGTAC CATGTTGCAC GAG CAGTATT 9600 

ATTATCAACA ACATAAACAA CTAACACACC AGATTTGCTT GATGTATAAG TTGATTCATC 9660 

GAACGAAGAA CCGTCATCAA CACCATCTTG TCCAGGCTTC TCTAACGTGC CTATATCCGT 9720 

CTTTTCTGGC GCATCTGTTG CATTAGTAAT ATGAATAATC CTAGATGTGT TAACTGCGCT 9780 

TAAAACGCTA TCTATGGACT GCTCATACGA TTCAATTGCT TTACCGTAAT CATCTGTAAG 9840 

TTTAGACTTT TGCCAATTCG TTGTTGAATT ACCTTTAACA AGGTCAGCGC CATTGATTTG 9900 

TTGTTCAACT TCGTTAACAC GTTCAAAAAT CGCTTGCTCT TTTTCAACTA TTTTATCGAA 9960 

TTCAGCTGTA ACAGCTTGTG TTGCACTAGT TTGCGTCGCA GTAATAGCTT GTATAGCTTC 10020 

GTTTTGCTTG ATTTCGATTT GTTGAATGCC TTTTGTCGCA CTATCATTCA CTTTTGCTAT 10080 

TAACGTTTGT GTATCAG CCA TATTTTGCTT TAATTGGTTA AAATCTTTAC CGACAGCTTC 10140 

GATAGTATCT TGAATAGATT TGATATAAAC AAGCTTTGTT ATACCATCAA ACCCACTAAC 10200 

TAAATCATTT TCAATATTGA AGCTAAATTG ACGTTCAACA ACAACATTAT TACTCCCGTT 10260 

TTGTGTAAAG AATGCCTGAG CATGCACCTT GCCTGAATGT TTTAAAAATT CATTCGGTAT 103 20 

CACATACTGC AAACGCCCAT TAATTGCGTC TACTATCGTT AATTCGTCTG AAATATAAGC 10380 

GCCTCTATCT ACGTTATAAT CATCGGTTTT TAAnAOGATA GATGTTTTAA CATGTTCAGA 10440 

ACTTATAGAT AAGGGTCTGT TATnCTTAGT 10470 

(2) INFORMATION FOR SEQ ID NO: 21: f 

(1): SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3647 base pairs 
- (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



46 



50 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

ATCAGATCTT GAGAATCGAG TTATTAAGTC TATCGAAGAC TTAACTAAAA TCCAAC CATT 60 

CATGCCTACA CAAGATTTTG ATTTTAAAAC TAAAGAAATT CAATCAAACA TTTCTGAAGA 120 

AAGATTTATC GAAATGATTC AGTATTTCAA AGAGAAAATA ACAGAAGGGG ATATGTTCCA 180 

AGTTGTGCCA TCAAGAATTT ACAAATATGC GcATCATGCT AGTCAGCATT TAAATCAACT 24 0 

TTCGTTTCAA CTGTATCAAA ATTTAAAACG ACAAAACCCA AGTCCATATA TGTATTATCT 300 
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TCAAATTGTA ACAACTAATC CTATTGCAGG TACGATTCAA CGTGGTGAGA CGACACAAAT 420 

AGATAATGAG AATATGAAAC AACTACTTAA TGATCCAAAA GAATGCAGCG AACATCGTAT 480 

5 

GCTAGTTGAT TTAGGACGTA ATGATATTCA TAGAGTAAGT AAAATCGGTA CCTCAAAAAT 540 

TACTAAATTA ATGGTTATTG AAAAATATGA ACATGTTATG CATATCGTAA GTGAAGTCAC 600 

AGGTAAAATA AATCAAAATT TATCGCCAAT GACAGTTATT GCGAATTTAT TACCAACAGG 660 

-10 

TACCGTTTCA GGTGCACCAA AATTACGTGC AATTGAAAGA ATATATGAAC AATATCCACA 720 

TAAACGGGGC GTTTATAGTG GTGGTGTTGG ATACATAAAT TGTAATCATA ACTTAGATTT 780 

^ TGCATTAGCA ATTCGAACGA TGATGATAGA TGAGCAGTAT ATCAACGTAG AAGCTGGTTG 84 0 

TCGCGTTGTA TATGATTCTA TTCCTGAAAA AGAACTGAAT GAAACGAAAT TGAAAGCTAA 900 

AAGCTTATTG GAGGTGAGCC CATGATCTTA GTTG T AGATA ATTATGATTC CTTTACATAT 960 

20 AACCTAGTGG ATATTGTTGC T CAACAT ACT GACGTCATTG TTCAATACCC TGATGATGAT 1020 

AATGTGCTGA ATCAATCGGT GGACGCTGTT ATTATATCTC CTGGTCCAGG GCATCCATTA 1080 

GACGATCAAC AGTTAATGAA AATCATATCA ACCTATCAAC ACAAACCCAT TTTAGGTATT 114 0 

25 TGTTTAGGGG CTCAGGCACT GACTTGTTAC TACGGTGGAG AAGTCATTAA AGGCGACAAG 1200 

GTTATGCACG GCAAAGTTGA TACACTAAAG GTTATATCGC ATCATCAACA TCTGTTATAT 1260 

CAAGATATAC CAGAACAGTT TTCAATTATG AGATATCATT CATTAATAAG TAACCCTGAC 1320 

30 AATTTTCCAG AAGAATTGAA AATTACTGGA CGTACCAAAG ATTGTATACA GTCATTCGAG 13 80 

CATAAAGAAA GACCGCATTA TGGTATTCAG TACCATCCTG AATCATTTGC TACAGACTAT 144 0 

GGTGTCAAAA TAATTACAAA TTTCATTAAT CTAGTGAAGG AAGGATGAAA ACCATGACAT 15 00 

35 TACTAACAAG AATAAAAACT GAAACTATAT TACTTGAAAG CGACATTAAA GAGCTAATCG 1560 

ATATfiCTTAT TTCTCCTAGT ATTGGAACTG ATATTAAATA TGAATTACTT AGTTCCTATT 1620 

CGGAGCGAGA AATCCAACAA CAAGAATTAA CATATATTGT ACGTAGCTTA ATTAATACAA 1680 

40 

TGTATCCACA TCAACCATGT TATGAAGGGG CTATGTGTGT GTGCGGCACA GGTGGTGACA 174 0 

AGTCAAATAG TTTCAACATT TCAACGACTG TTGCTTTTGT TGTAGCAAGT GCTGGcGTAA 1800 

AAGTTATAAA ACATGG t AAT AAAAGTATTA CCTCaAATTC aGGTAGTACG GATTTGtTAA 1860 

45 

ATCAAATGAA CATACAAaCA ACAACTGTTG ATGATACACC TAACCAATTA AATGAnAAAG 1920 

ACCTTGTATT CATTGGTGCA aCTGAATCAT ATCCAATCAT GAAGTATATG CAAC CAGTTA 1980 

60 GAAAAATGAT TGGAAAGCCT ACAATATTAA ACCTTGTGGG TCCATTAATT AAT C CAT AT C 2040 

ACTTAACGTA TCAAATGGTA GGCGTCTTTG ATCCTACAAA GTTAAAGTTA GTTGCTAAAA 2100 
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AAGCAACACT ATCTGGTGAT AATTTGATAT ATQAATTGAC TGAAGATGGA GAAATCAAAA 2220 

ATTACACATT AAATGCGACT GATTATGGTT TGAAACATGC GCCGAATAGT GATTTTAAAG 2280 

5 GCGGTTCACC TGAAGAAAAT TTAGCAATCT CCCTTAATAT CTTGAATGGT AAAGATCAGT 2340 

CAAGTCGACG TGATGTTGTC TTACTAAATG CGGGTTTAAG CCTTTATGTT GCAGAGAAAr 2400 

TGGATACCAT CGCAGAAGGC ATAGAACTTG CAACTACATT GATTGATAAT GGTGAAGCAT 2460 

10 

TGGAAAAATA CCATCAAATG AGAGGTGAAT AATATGACGA TTTT AT CAG A AATTGTTAAA 2520 

TATAAACAGT CACTTTTACA AAATGGCTAT TATCAAGACA AACTTAATAC CTTGAAAAGT 2580 

GTGAAGATTC AGAATAAAAA ATCTTTTATA AACGCAATTG AGAAAGAACC AAAGCTAGCA 264 0 

IS 

ATTATTGCAG AAATTAAATC GAAGAGTCCT ACAGTTAATG ACTTACCTGA ACGAGATTTA 2700 

TCGCAACAAA TCTCAGATTA TGACCAATAT GGTGCAAATG COGTGTCCAT TTTAACTGAT 2760 

GAAAAGTACT TTGGTGGTAG TTTTGAAAGA TTACAAGCAT TGACGACAAA AACAACATTA 2 820 

20 

CCCGTATTAT GCAAAGACTT TATTATAGAC CCGCTTCAAA TTGATGTTGC TAAACAAGCT 2B80 

GGTGCATCTA TGATTTTATT OATCGTTAAC ATCTTATCTG ATAAACAATT GAAAGATTTA 2940 

2s TATAACTACG CT AT AT CG CA AAATCTAGAA GTGTTAGTTG AAGTACATGA TCG CCATGAA 3000 

TTAGAACGTG CCTATAAGGT TAATGCTAAA TTGATTGGTG TAAATAACAG GGACTTAAAA 3 060 

CGATTTGTTA CAAATGTGGA ACATACAAAT ACTATTTTAG AAAATAAAAA AACAAATCAT 3120 

30 TATTATATTT CTGAAAGTGG TATTCACGAT GCATCTGATG TAAGAAAAAT CTTGCATAGT 3180 

GGTATCGATG GCTTACTAAT AGGTGAGG CG CTTATGCGTT GTGACAATCT AT CTG AATTT 3240 

TTACCACAAC TGAAAATGCA AAAGGTGAAG TCATGATGAA ATTGAAATTT TGTGGCTTTA 3 300 

35 CATCAATAAA GGATGTTACA GCGGCCAGTC AATTACCTAT TGATGCGATA GGTTTCATCC 3360 

ATTATGAAAA AAGTAAAAGG CATCAAACAA TTACC CAAAT AAAAAAGTTA GCGTCTGCTG 3420 

TTCCAAATCA TATCGATAAA GTATGTGTCA TGGTAAATCC TGATTTAACA ACAATTGAAC 34 80 

40 

ACGTATTAAG CAATACGTCA ATTAACACAA TACAGTTACA CgGCACAGAA TCTATTGATT 3 540 

TTATACAGGA AATTAAAAAG AAATATTCAA GCATTAAAAT CACTAAAGCT TTAGCTGCaG 3 600 

ATGgAAAACm TwATCCCAAA caTtAAtnAA tnTTAgGGGG TCCGTGG 3 647 

45 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 966 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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GACGCACCAT GCGTTTTAAA TTTAATGCAC GATTGATACC ATTTTCATAA GCAGTTTTAG 1800 

ACACGAATGT CATTGACGTA CTTGTAAGGT TTCCGCCGTA TTGACCATAC ATTTTACGGT 1860 

5 

ACTTCATCGG TTCAGATGTA GGTATAGAAC CATTTGCATC GCCATTTACG GCAGAGTTAA 1920 

TTAATCCGCC CTTTACAACT AATTCAGGTT TAACCCCAAA GAAAATTGGG TCCCATAAGA 1980 

1Q CAATGTCAGC TAGTTTGCCC GGCTCGATAG ATCCTACATA TTCAGAAATA CCATGTGTAA 204 0 

TTGCTGGGTT AATTGTATAT TTAGCGATAT AACGTTTGAT GCGATTATTA TCATTATGTT 2100 

CAAAATCACC ATCTAAAGGA CCACGTTGTT CTTTCATGCG ATGTGCTACT TGCCATGTTC 216 0 

15 GTGTAATTAC TTCACCTACA CGGCCCATTG CTTGTGAATC GGAACTAATC ATACTGAATA 2220 

CACCCATATC TTGCAGAACA TCTTCTGCTG CAATCGTTTC TTTACGAATA CGTGAATCTG 2280 

CGAATGCGAT ATCTTCAGGA ATAGCCGCAT TTAAATGGTG AGTAATCATT ACCATATCTA 2340 

20 

AATGTTCATC TACAGTATTA TGTGTATAAG GCAAAGTTGG ATTTGTAGAT GAAGGTAAAA 2400 

TATTTGAAAA TGCAGCGGAT TTAATTAAAT CAGGCGCATG ACCGCCACCA GCACCTTCAG 2460 

TATGGTACAT ATGAAGTACA CGGTCTTTAA CAGCAGCCAT TGTGTCTTCC ATAAATCCTG 2520 

25 

CTTCATTTAA AGTATCTGCA TGTAATGCAA TTTGAACATC AAATTCATCA GCAACATCTA 2580 

ATGCATGACT CAAAGCAGAT GGTGTTGCAC CCCAGTCTTC ATGTACTTTT AATCCAATTG 2640 

30 CTCCGGCATT GATTTGTTCA ATGAGTGCAG TTGGATTTGT TGCTTGTCCT TTACCTGTAA 2700 

AACCGACATT AATCGGTAAA CcTTCGGCAG CTTCTAACAT TCTATGAATA TGCCATGGAC 2760 

CTGGAGTTAC AGTTGTTGCT TTAGAACCTT CTGAAGCACC AGTACCACCA CCAATATGAG 2820 

35 TCGTAATACC ACTTTCTAAT GCGACCTCTG CTTGTTCAGG ATTAATAAAA TGAACATGAG 2 880 

TATCAATACC ACCAGCAGTG ACGATTTTAC CTTCAGCGGC AATGATATCT GTTGTTGAAC 294 0 

CTATAATAAT GTCGACATTA TCCATTATAT CTGGGTTGCC GGCATTACCT ATGGCGAAAA 3000 

40 

TATAACCATT TTTAATGCCT ATATCAGCTT TAACCACTTT ATCGTAATCG ATAATAACGG 3060 

CATTAGAAAT GACAAGGTCT GCAACGTTCA CGTCATCACG TGTTACACGA GGATTTTGCG 3120 

CCATACCGTC TCTAATAGAT TTACCACCAC CAAAAGTAGC TTCTTCACCA TAAACCGCAT 3180 

45 

AGTCTTTTTC TATTTG AG CA AATAGATTCG TATCACCTAA ACGAATGGAA TCTCCAACAG 324 0 

TTGGACCGTA TAAGCTCGTA TATTGATTTT GCGTCATTTT AAAGCTCATG ATCTTTTTCC 3300 

SO TCCTTTTTTA TTCACGTTTT CAGCACCGTT ATCTCCGAAT ACACCTGCAT ATTCATCATT 3360 

TTCATCAGTT GGGCGATAGA CACGTGACTC ATCGATAGGA CCATTGACCA TACCACGAAA 3420 

ACCAAAAATT TTACGTTTGC CAGCATATTC AACTAATTGA ACTTCTTTTT TAT C CCCAGG 34 80 
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TTCGAAATCT AATGCTGCAT TTGCTTCATA AAAATGAAAA TGTGAGCCCA CTTGAATTGG 3600 

TCGATCTCCT GTATTTTCAA CTTCGATAAC TGTTTCAGGA TGATGGTTAT TAATTTCAAC 3660 

5 

CTCTGTACTT TTTGTAATAA TTTCTCCTGG TATCATTTGA CTGCCTCCTT TAAACAATAG 3720 

GGTGATGTAC TGTGATTAAC TTAGTACCAT CGGGGAAOGT AGCCTCGATT TCGATATCTG 3780 

TAATCATGTG TTCGACACCA TCCATGACAT CTTCTTTGTT TAGAATTTGT CTACCATAAC 3 840 

10 

TCATTAACTC TGCAACGGTC TTACCATCGC GTGCACCTTC TAATAATTCA TCGCTGATTA 3 900 

AAGCTAATGC CTCAGGATGA TTTAGTTTCA AACCACGTGC TTTACGACGA CGTGCAACTT 3 960 

15 CCGCCGCCAC TACAATCATT AATTTGTCTT GCTCTCGTTG TGTAAAATGC AAATTAAAAC 4020 

CCCCAATTTC ATATTAGATA CaATTTACAA AATTTATATT AATCCTAATT GTTGTGATAA 4 080 

ACAAGTAATA TACAAAGTTC AATGTGTAAT TAGAAAATTA TATTTTTAGC ATATCCGATA 4140 

20 

TTGAAGCAAA CAATCTAATC GAAAACAAAT AGTGGAATAT ATTTATGTAA AAACCAAAAT 4200 

AGTTTTTAAT ATAACTTTTC ATAGAATAGT AGTATATTAA TGAGTAATOA TTCAAAGGAA 4260 

AGGTGAAAGA TTTGAAGATA ATAGATGTGC TTTTGAAAAA TATATCTCAG GTTGTGTTAA 4320 

25 

TTAGTAATAA ATGGACAGGA TTATTTATCT TAATAGGATT ATTTGTAGCC. GATTGGACAA 43 80 

TTGGATTAGC GGCTATTGTA GGTAGCATCA TCG CCTATAC TTTTGCGCGT TTTATAAATT 4440 

30 ATAGTGAGGC AGAGATTAAT GATGGGTTAG CTGGATTTAA TCCAGTGCTA ACTGCCATTG 4 500 

CGTTAACAAT CTTTTTAGAT AAGTCAGGAT TAGATATTGT TATAACAATG ATAGCAACTT 4 560 

TATTAACGTT ACCAGTTGCT GCTGCAGTGA GAGAAGTTTT AAGACCATAT AAAGTTCCGA 4 620 

55 TGCTGACGAT GCCTTTTGTC ATTGTGACTT GGTTTACAAT TTTACTTTCA GGACAGGTTA 4 680 

AATTTGTAGA TACATCGTTA AAGTTAATGC CTCAAAACAT TGAAACGGTT AATTTTAGCA 4 740 

ACAATGATAG AATaCATTTC ATTCAGTCAT TATTTGAAGG ATTCAGTCAA GTATTTATCG 4 800 

40 

AAGCGAGTGT AATTGGTGGC GTATGTATTT TAATCGGCAT ATTGATAGCA TCAAGAAAAG 4 860 

CAACACTCTT AGCTGTTATA GCTAGTTTGT TAAGCTTTAT CATTGTAGCT CTATTAGGTG 4 920 

GTAATTATGA TGATATTAAT CAGGGATTAT TCGGTTATAA CTTTGTATTA ATGGCAATCG 4 980 

45 

CACTAGGATA TACATTTAAA ACAGCGATTA ACCCTTATAT TTCGACTTTT TTAGGTGTGT 504 0 

TATTAACAGT AGTGGTGCAA CTAGGTACAA CAACATTGCT TGAACCGTTT GGCTTACCTG 5100 

50 CATTAACATT GCCATTTATT ATCGTGACAT GGATTTTATT ATTTGCTGGT ATTAAACATG 5160 

ACAAAGTAGA TGCTTGATAG TTAAATCAAA CCTAATATTG TTTGAATATC ACCTTAAACT S220 

ATACAGCGAA TTGTATAGTT TAAGGTGTAT TTTTATGGAT AAAATTAAGT GCATACTTAA 5280 
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GTGTTAAACT AGGAATAAAT AATTTATATT GTGTGTTGTG TGGGGTGACT AATATOAATG 5400 

ATATGGATAA TTCCTTTTTA ATAACAACGG AAATTCAAAG AAAATGGATT GAAAAATTCA 5460 

AAGTAATTAG AGATACATTT AAGGCTAAAG CTGAATATAA TGATCAACAT AGCCAATTTC 5520 

CATATAAAAA TATTGAATGG TTAATTAAAG AAGGTTATGG AAAATTAACG TTACCAAAAG 5580 

CATATGGTGG TGAAGGTGCG ACCATAGAAG ACATGGTTAT TTTGCAATCA TTTTTAGGCG 5640 

AACTTGATGG TGCCACAGCA TTATCTATTG GTTGGCATGT GAGTGTCGTA GGACAAATTT 5700 

ATGAACAGAA ATTATGGTCT CAAGATATGT TGGAGCAATT TGCTGTTGAA ATTAATAATG 5760 

15 GTGCATTAGT T AAT AG AG CA GTTAGTGAAG CTGAAATGGG TAGTCCAACA AGAGGGGGAA 5820 

GACCAAGTAC ACATGCTGTT AAAGCTGATG ATGGGTATAT TTTAAATGGT GTGAAGACAT 5880 

ATACATCAAT GAGTAAAGCA CTAACACATA TTATTGTTGC TGCTTATATA GAAGAATTAG 5940 

20 AAAGTGTTGG TTTTTTCTTA GTAGAC 5966 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 17310 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

CTGTGXCATC GCGAAATAGT TAGGGTCATT CATTAATCCT TTTGAACGTA TTTCATCAAA 60 

3S ATATAACAAT TTCATTAGTA AAGGGGACTT GTTCAAACCA GCTATAATAC AAAATAGACC 120 

TATAGTCACA CTGCTTATAA TATAAGAGGT AACGATCACT TTTTTGCTAT TACCTAACTT 180 

AAAGSTGATC ATCCCTAAAT AGAAATAAAT GACTACAAAT G CAT ATTTAA CTGTAGATGC 240 

40 AAGAACTTCC TTAACCGTAA TAAATATCAA ATCATCAAAA AATaGCaAAC AArGCGTAAT 300 

AATCATACGA TATGTATACA AAATAATGAm AAACTGTmAA AAATGATTTG CCTTTAATAA 360 

ATGGTTAGCG AAAAACAGTA AATAAACTAA TATTAGTAAT GTGATAAAGT CAGCTATAGA 420 

45 

AACATTCACA CCGGCAATAA CCGAAGATTG CTGAATAAAA ACCGCTAAAC CGATAAGTAA 480 

CAATGTTAGT AATTTACTAT TGTGTTGATT TTC CATTATA AACGTCTTCC ACTTCTTTAA 540 

6Q TCATTTTCTC CTCAGTAAAA CATTCTAAAT AACGTTTTCT AGATTGATTA CTCATTTTGA 600 

TGTAATCACT GT CT ATT AAA TATTTTTCCA GGACTTTAGC AATAGTTTCG GGTTGGTTGT 660 

TCATCATACA TATACCATTA TCAGCTACTA ATTCTGAAAT ACCGCCAACA TGACTGGCTA 720 
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TTATTAAAAT AAACGTATCG TATTGTGATA ATAAATGACT CGCATTAATG ACATTGCCCA 84 0 

AAAATGTGAC ATCATTTTCT AACCCAGCTT GTACAACTTG TTGCTGACAA TCATTTAATG 900 

5 

TAGGTCCATC GCCTATAAAT GTAAAATGCG CATGATTACT GTTATGTAAT TTCAATATCT 960 

CTATTGCCGC GATTAGATTT TGTGGCAATT TTGGATAAGC AAATCTTGCA ATCATAACAA 1020 

ATTGATGCTT TGTCGGGGCA TTAATCTGTA AATCTTGTTT ATTAGGCAAC ATTCCAACTA 1080 

10 

CTTCGCCAAT ATTGTTATGT GATTGGCTTT TTAGOGTTTG CTTAACAGCG GGAACATCTG 1140 

CAATACCATT ATGTATTGTG GTTAATTTCA ATCGATTAAA TCGATATTTT AACGCTAACT 1200 

15 GTTTATCGAA ATCTGAAACA CAAATAATGC TATCTGTAAT AAGTGACATT AATTTTTCGA 1260 

TAACTAAATA TAGAAATTTT TTAGCTGGTT TAACACCCTC TGTAAAAGCC CATCCATGTG 1320 

CAGTAAAAAC TATACGTGTG TCTTTCGATT TCGAAATGAa CTtCGCAATT CGTCcGACCG 1380 

pn ' __ 

TtCCAGCTTT GGAAGAATGT AAATGGATAA CATCAGGTTT AATTTTCGAG AATAACTGTG 144 0 

CTAACACTTT GACAGCTAAA ATATCTTGTT TAAAGTCAAT TGGACCTACT AAATGTTCGA 1500 

TAATAATTAC ATTAACTCTT GCATCTAGTT GTTCAATCAT TGGTCCATGA TTGCCTACAA 1560 

25 

TGACATAAAC ATCATTGTGT ACGCAAAAAT GGTTGGCGAG TTGAATGAGA TGTGTTTGTG 1620 

CACCACCATT GTCTGCTTTA GTAATACAAT ATATAATTTT CAACTGTTAC AAACCCCTTT 1680 

30 AATGCTATAC TTTCAATTTC TTAACATGGC TATCTCATCA GATGAATAGT ATTTATAGCC 174 0 

ATGCAAATCA ATGATGG CAC ATATTTCTTA ATGCCATTTG ATACTGTCTC AAGGGATTCC 1800 

TCGTTATACT GTAACAATTG GTCACAATCT TTAAAATATA ACTTTTATTT GAACTTATTA 1860 

35 AGTAAATTAA GACTACCTTG AGCCTTCCCC TGTAATAACA ACCATCAATG TTCTAATTGA 1920 

TATATATAGT TCCATCATTA AACTACCTTT ATGTATATAT TTCATGTCAT ATTTCAGTTT 1980 

TTGTTGCGGT GTTAAGTCAT ATCCACCTTG AATTTGCGCA AGTCCTGTTA ACCCTGGTGT 204 0 

40 - 

AACAAGACAT CTTTGCTCGA AACCTATCAC TTCTGAACTA AATAATTCTA CAAATTCCGG 2100 

ACGTTCCGGG CGTGGTCCAA TAAAACTCAT TTCCCCTTTA ACAACATTAA TTAGTTGTGG 2160 

TAATTCATCA ATGCGTGTTT TACGAATAAA CTTCCCGACA TTTGTTATAC GAT CAT CATC 2220 

45 

TTTATCAGCC CATTGCGCAC CGTTTTTCTC TGCGTTTTTG CACATCGAAC GTAATTTGTA 2280 

TATTTTAATT AATTTACCCA TCTTCCCAAC TCTAACCTGA CTATAAATAG GGTTTCCTGG 2340 

SO CGAATCTATG ACGATAGCAA TGGCGAATAT AACCATAATC GGTAAAGTTA AAAATAATAA 2400 

AACAATGCTT AAAATTAAGT CAATCGCACG TTTAATTGGG TAATAGCTTT TTCTCACTTC 2460 

TTCTAGTTTG TCTAATTTTC TTTGATAGGC ATAACCCTTA TTATTATGGA CAGCTTCAAT 2520 
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AATTAAAGTA ATCCTTTAAA CCTOTTTCTA CTGTATATTT AGGAACAAAT CCTAATGCCT 2640 

TTAAGTTAGA AATATCTGCA TAAGAATGCT TAATATCTCC TTTTCGTGCT TCTTTAAATT 2700 

5 

CATGCTCGAC TGATTTTCCA TATAATTCAC CAATAATACG ATAAACCTCT AATAAATTAG 2760 

TAAAAGTGCC TGTACCAATG TTATAACCGT GTCCAATTGC ATCTTTGTGT TCCATAATTA 2820 

AGCGTACAGA TTGAACAACA TCATATACAT ATACAAAATC TCTAGTTTGC AGTCCGTCAC 28 80 

10 

CAAAAAATGT AAATGGCTTG TTATGCTCAA ATGAATCGAA CATCTTTGAA ATCACACCTG 294 0 

AATATTGTGA CTTAGGATCC TGTCTTGGCC CAAATACATT AAAAAATTTA ACAAC CGCTG 3 000 

15 TTGGTATGTT ATATAACGAA CAATAATTTA ATGTCGTCCG TTCGCCGTAA TATTTATCTA 3060 

TTGCATATGG TGATAATGGT AAGATTAATG ATTGATCACT TTTAGGCAAA TCAGGAAGAT 3120 

CACCATAAAC AGCTGCTGAC GAAGCAAAGA TAAAACGTTT TATATGATTA TTATATTTTT 3180 

20 TAATGATTTC TAACAATCTT AATGTTGCTA CGACGTTTAT TTCTTGAGAT AAGATAGGTT 3240 

TCTCAACCGA CTGAGCAACA CTAACTAATG CTGCTAAATG AATAACATAA TCAAATTGAT 3300 

ATGTCTTCAT GATTTGTTCA ACTGCATCAT ATTCACGAAT ATCTAATTCA AACACATGAT 3360 

25 

CGTCAGCCAA ACTTTTAATA TTTTCTCGTT TACCTGTTCT ATAGTTATCT AGAACATAAA 3420 

CATCATAATC TTGTTGTAAA TCATCTACTA AATGCGACCC AATAAAACCA GCCCCACCAG 3480 

30 TTATCAAAAC TCTTTCCAAA TCTTCCACCT CATTTATACA TTAAAAATAT ATCATAAAAA 354 0 

CATAAAGTAT TGTAAGCTTT TTATCGATAT TTTTTATTTA TAAAAATAAA ATGAGATAAC 3600 

TTTGTGAATT TTTATTGAGA TAAATTAGAT AGTGGTGTTT TTGTGATGTT TTATAATATC 3660 

35 TTGGGTGTGT TAATACTAAT AATGCTTTCA ACTGATGCAT TAGACTGTGA CATCATAACT 3720 

CACTTAAGAA CTTCGCTTAT TAATTTTCTA CCAATACACT CCCTTCTAAG TGCACTAAAA 3780 

AATCCTTACT GCTAAGTGAT TAAACTTAAC AATAAGGATT TATTTATCAT TAGTGGATGA 3840 

40 

TTATTAACGG AATCTCATAC CACCATCTAC AATAATTGTT TGTCCAGTAA TGTAATCAGA 3900 

GTCTTTACCA GCTAAGAAGC TCACTACATT TGAAACATCT TCTGGTTGAG AAACTCTGCC 3 960 

CAAAGCAATC TGACTTGTAA ATTGTTCCCA ACCCCATGCT TCAGGTTTAC CTGCTTCTTC 4020 

45 

GGCTGTTGCC ACTGCGATAC TTTCCAT CAT TGGTGTTTGA ACGATACCAG GTGCGAATGC 4080 

ATTCACAGTA ATACCTTCAG ACGCTAAATC TTGTGCGGCT ACTTGTGTTA AACCTCGCAC 4140 

SO TGCGAATTTT GTACTGCAAT ATAAAGACAA GCCTGGGTTA CCCTCAACGC CTGCTTGAGA 4200 

TGTTGCATTG ATAATTTTAC CGCCATGATT GAATTTTTTA AATTGTTCAT GTGCGGCTTG 4260 

AATACCCCAT AGCACACCTG CAACGTTCAC GCCATATACT GTTTTAAACT GTTCTTCAGT 4320 
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GCCAAATTGC GCGCCAGTTT GTCTTAcTGC GTTAAATACA TCATCACGGT TTGATACATC 4440 

TGCTTTGATA GCAATAGCTT TTGTACCATC ACTTGATAAT TTAAGTGCAG CTGCTTTTGC 4500 

5 CCCTTCTTCA TTGAAATCAA CAACTGCTAC TTTGAAACCA TCTTCCACTA AACGTTCTGC 4560 

AATTTTAAAA CCAATCCCTT GTGcTCCGCC AGTTACTAAT GCTACTTTGT TGTTTGTCAT 4620 

AAAGATCACT CCTCAAATTT CTTTCCTTTA ATTACATTTT ACTCCTCTTC ATTTGAATAG 4680 

10 

TACAACAAAG GTAGCTCCAT TTAACAAAAT ATTCAGATAT TTAAGGTATA GTTAAACGCA 4 740 

CTACCATTAG TGATTGGCAA TGCGTTTAAA TGTCGTTTTA AAAGTTCTTA TGTTGAATAT 4800 

15 TATTTTTTTA AGTCTCTCGA TTAGTTTGTC ATCAATCTTT TTTCGAGACA TGGTCTTTTG 4860 

ATTCAATAGG CGGTTCCGTG TTATCACTGA CAACTTTAGT TGTAGCTTCA TCTTTATGTA 4920 

TTTCTTCGTT AAATCCTTCA AGGTTTTTAG TCGTGGGATT TTTAACCTCA GGATGTTCCA 4980 

20 TCATGTCTTG ACTATCAAGT TCCTTTTTAC ACGTGTCTTT ATGTGATGCT TGATTTGCGT 5040 

TCCCTTTACT TTTTTGAATA GTGGTAGTAT CTGCTGCAGC TACTAATTTT TTTCTACCTA 5100 

AAATAGATAT GGCTGAAACA AACCAGAGTA TTG CAG AT AC AAAGTTGCAT AATACTAAAG 5160 

25 

CGATAATAGC GAATACAATT AATATGACAC CTTTTGAAAT CCTTTCTTTA AATAAGTCAG 5220 

ATGCCAATAC GATGACAGGT ACGATTGAAA GTATAATTAC AAATATAGAA ATTATTGCCG 5280 

ATATAACTAT TGTTACTATT AAATAATCAG CTCTGCTACC TGATAATAAA TAGAAAAGGC 534 0 

30 

CGAAAATTAG TCCATAGCAA ATTACAAACC CACATAAAGT TATAGCCATG AGTACTATAT 5400 

AAGCTATTTG AAAATATAAA CCTATCTTTA TGAATGATTT TTCTACATTT TTTTCCATGT 5460 

35 CTATTCCCCA TTTATTTAAA ATTTATACTT TACCTTAAAT ATTCTCTTTA TTCTTTAGTG 5520 

ATTTTATCTT TAGATTCAAA TTGATTCTCT GTACTTTCAA TATCAACTTT TTCATTTTCG 5580 

TCTGTCGATT CATC T T TT GA GTATTTATTC CAAATCAGCA AAATACCACC AATCAGCCAT 5640 

40 AAAATTGACG AAAGGAAATT ATATAAACAC AGTGCAATAA TAGCATAAAC AATAAAAAGT 5700 

GCACCTCCGA TTACAGAGTA ACTTTCCATA TAAATCGCAG TAAAGATGGT TGGTAAAACA 5760 

GTGAAAAGAG CCAATATTAA TCCTAATAAA AAAATTGTTT CGTAATCAGA TCCTCCAGCA 5820 

45 

ATATTAATAG ATATCATCCT AACAAAAACG ACACTAAAAT ATATTTGAGC TACGATGCCT 5880 

ATCCAAATTG CTATTTTTCC TATAATTGAG CTCATACTCA TTCCCCATTT ATTTAAAATT 5940 

50 TATACTTTAC CTTAATATAC CTTATTTTAT TTAATTTTTA TATGCAAAAT ACAAAAATGG 6000 

AGAACTTCAA TATTTATAAA ATATCAAAAG TTCTCCACAC TATATTGTTT TATTATATTT 6060 

TCGCTATCAA TACGCTAAAT CAT CAT ATTT CCCTCAACAT CACAGTAAAA CTATTGCTCC 6120 

55 
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X X\m-\m-£V\X XVJV*. 


wwiy i iu i iL 


JA fA 7A TP TV *TV* TV 

AAUAxvwVTCA 


TCTTGTTTAA 


GTAATGC CAG 


TGGTACTTGA 


6240 




AfSATTft Anaf" 




TV TV TTA T*^P TV TA TV f*3 

AAxAi xAAAG 


CGTGTCACAC 


CTGCTGGCAC 


AGTTTC C CCT 


6300 


5 


X inluiviUin 




*P*T1 I' TA TV TV 

1 1 v-U I x A 1 AA 


CTCAATGGCT 


GATACTTCAT 


GAGTACATCT 


6360 




XVJ X X ununnn 




iv» t*fv /vTwrv^ *n 
1\»TACCTTQT 


GCAATTCTCT 


^WT*1V ^1 m M «L « ^» » 

CTACAGAACA 


ACAACCACTA 


6420 


10 


TAACTTGCGA 


CAACCTTTTC 


CCATACTTGA 


AAATGTGCTT 


CGCCTAAATC 


TTTTGTATAC 


6480 


AAATATTGTT 


CTGTATCACC 


ATGACACATT 


GTAATAAATG 


GCGCTTCTTG 


TCTTGTCTCA 


6540 




GTAGTCCATG 


GCAAGCGATG 


TTCTTGTTGT 


AACGTTTCCC 


ACCACACACC 


AAATGGAACT 


6600 


IS 


TTATGTTGCC 


ATGTACTAAT 


TGAATATTGT 


GTTTCATGGA 


TTTCTTGCAC 


TGGAACTTTC 


6660 




TTACATCCTA 


ACGCTTTCAA 


ACTTGTATAC 


CGATGCAGAC 


CATCTATAAC 


CATATATCTA 


6720 




CCATGTTGCA 


TCGCTGTCAC 


TAAAATAGGA 


TGACGTATAA 


AATCATCTGC 


TTCAATACTA 


6780 


20 


CTTTTCGTTT 


TTTCCAATCT 


TAAAGGTTCG 


AATGTTTCGT 


GAAGATCAAT 


CTTATCTACT 


6840 




GGTACCAATT 


TTAAATGTTC 


ATGAATATGA 


TTCAATAGTT 


ATTCATCCTC 


CTTTGTTTGT 


6900 


25 


GTTAAATAAA 


TAAATTCAGG 


ATGTGGATGG 


CTTAAGAAAT 


CGTGATGTGA 


AATAGACCAT 


6960 


CCGTATGCAC 


CTGCATATTT 


GAAAACAATA 


ACGTCGCCTG 


TACTGATTGC 


GTCTATCTGT 


7020 




ACTTGTCTAG 


CAAAGACATC 


TTTCGGTGTA 


CATAATTGAC 


CGACTAACGT 


TGTGTCCTGT 


7080 


30 


CTCGAAATTG 


AAACTTTTTC 


AAATGAATAT 


GGATTGTCCT 


TAT AG CGAT A 


AATGTCAAAA 


7140 




GGATGGTTAT 


GTTGCCAAGA 


TACCGGCAGT 


CTAAATTGTT 


GCGTACCTCC 


TCTTAATATG 


7200 




GCATACCAAG 


CACCATGTAC 


TTTCTTAATG 


TCTAGCACTT 


CTGTCACATA 


GTAACCAATA 


7260 


35 


TGTGCCACAA 


TAAAGCGCCC 


ACATTCAAAG 


TTCAATGTCA 


CATCTTCCAT 


TTCTTGCTCA 


7320 




ACGATAAGTG 


TTTTAAAACG 


TTCTACAAAA 


TTATCCCATT 


CAAATTGGTT 


AGTTAAATCT 


7380 




GCATAGTTAA 


CGCCTATGCC 


AC CACCAAGA 


TTGATATGTT 


TGAGTGGAAA 


TCGATGTTTT 


7440 


40 


TCAGAC CATG 


CCTTTGCTTT 


TTTAAAATAA 


AGTTTCACTA 


CATCGACATG 


TAAATTCGAG 


7500 




TCTAAATTGT 


TAGAAATAGA 


ATGAAAATGA 


AATCCATCTA 


GATGAATCTT 


TGGCATTGCG 


7560 


45 


AGCG CAg cTT 


CAATGACAT C 


ATCAACTTCG 


TCTTCAGAAA 


TACCAAATTG 


TGTTGGGCGT 


7620 


CCTGCCATAT 


GCAACGTTGC 


ATTGGGAAAT 


GGTCCTGCTA AATTAACACG 


CAATAAAATG 


7680 




TGTTGTGTCT 


TATCTTCATC 


TTCTAAGATG 


GCATTTAGCC 


GTTGTAATTC 


ATGCATACTT 


7740 


SO 


TCAACATGAA 


TACGCTGAAC 


ACCTTCACTT 


ACTGCATATC 


TTAGTTCCTC 


GTCTGTCTTA 


7800 




CCAGGGCCAC 


CAAAAATAAT 


ATGATTTGCT 


GGTTTAAAAG 


CAAGACCTTT 


TGCTATTTCA 


7860 




CCTTGAGATG 


CAACTTCGAA 


TCCTTCAACA 


TACTGACTAA 


TTGTATCTAG 


GATTTTTCGT 


7920 



55 



287 



EP0 786 519 A2 



TGTTGCAAAT GATGTTCCAG TCCGACTAAA 
TGTGCTTTTA ATTGTTCAAT AACAGGTTGA 
5 GTTTAGACGT CGCTAGAGAT GCACTTAAAT 
AAATAAATGT TTGTACACCT TGTGCCTGCC 
ATGCACAAAA ATGTTTACCA TGTGCATTCA 

10 

TTACTTGATC ATCACGCGTT TGCCATGGTA 
CTTCGACTAT CATGTCTAAA CCTTCGACTT 

1S CAACATCTTC TATCATGGCA ATCACCATAA 
GTAATGGTGT ACGTCCAAAT CTTGCCATGC 
GGTAATAACG ACTTAATTTC ACAATATGCT 

20 CAATAATACC TCTCGCACCC ATATCCAACA 
TGACACGTAC AATTGGTATA ATATGCGCTG 
TCTCATCATT AATCGCCACG TGTTCTGTAT 

25 

CGATAACCTC GATCATCAAT GGGTCCGGTA 
CATTGTTTAA TCTATGTTTC AGAGATAGTT 
GGATTTGTAA CATGATGAAT TCTTAACTCG 

30 

TTTTCAACTT GAATCGTAGG TTCAAACAAA 
AATGCTTCTT GATACGCCTC GATGATGCCT 

35 ATACCATATT GCTTTTCAAT AAATAAGATG 
TCATGTAAAA AGTCGCGTAC TAAACGTTCG 
ACTTTTTTAT GTGCTTCTGG CATTGGCTTT 

40 TGCtCACGCT TAAAACGAAC ACCATCATGG 
CCATTTTCAT GAATGAGCAT CATATTTTGT 
TAAAGCATAT GAATCATTGG ACGAATCGCT 

45 

GAAC CAT ATT GTTTAATCCA ATTTTCAATG 
AGTGCATTAA ATGGTATCGC ATCCTCTTCA 
50 CATATAACAC CTAACGCACC ATAAACTTGA 
AAATAAGACT GTCCTAAGAC TTCCC CTAGA 
ATATCTTGTT GCTGTATCTG CTTTAACCAA 

55 



TCATAGATAT AATGACAAAC TGGATOAGAT 8040 

ACTATACGCA TTAGCCTTCA TCCCCTTTCT 8100 

GGCGATATAT TTTTCCGCGA TCATCACCTA 8160 

ATTTTGCAAT ATCTTCATCT TCACGTGGTA 8220 

CAACTTCAAA AATATGTTGA ACATGTGATG 8280 

TGCCAAGTGA CTG CGAT AAA TCTGCGGCAC 834 0 

GTGCTATATC GTCAATGGCC ATAACCCCTT 8400 

TATGCTCATT AGCCATCTCC ATTGCATCAA 8460 

GACCACCATT CAAACTTCTT AATCCTTGCG 8520 

CAACTGTCTC ACGAT CTTTA ACGTGTGGCA 8580 

CTTTAATGAT ATCTCTATCT ATCACTGCAG 8640 

CTTCAGCTGC ACGAATTAAA TGCGCTAGTG 8700 
CAATCACAAC AAAGTCATAC CCGCTTGCTG - 8760 

TAGAATTAAA AATG CCATAA ACTGAATCAC 8820 

GTTGCATCAT TGATACCTCC TACACCTAAT 8 88 0 

GAGTCACTTA ATAATCGACX3 TGTCGTTAAC 894 0 

TCGAAATGTT GATAGTTATT CAACTCTGGA 9000 

TTAACCCATT GCCATTGCAG CTCCTCATCG 9060 

ATTTCGGCGA TATTAATAAA GAAAAATGCA 9120 

TCATCTGTTT CAATAAATGA ATTACTATTC 9180 

AATGTCAGGT GTGAAGCAGC TTCACTTAAA 924 0 

AAATCTTTTA AGGCAATACG TGTAGGCCAA 9300 

GCATGCGATT CAAAGGCAAT ACCGTGATAA 9360 

ACAGCTAAAA ATTGCTTTGT CCAAGCTTCA 9420 

AATGGTACAC CATCCTTATC ACTTGCATAA 9480 

TCGATTAACA TATGATATAT ATTTTCACGC 954 0 

GTTTGTTTAT AAGGCGAAAG TTGTGTATTT 9600 

AAAACTGTCT TTAATTCATC TTTTAAATAC 9660 

TCCGTAATTT GCGCTGCATT TTCAATTGTA 9720 
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10 



15 



20 



25 



30 



35 



40 



45 



TATTTTGTCG TGTCTATTGG CGACATCGTA CGAATCGATT 
TCACTTTCCC CTAACCATAG TACTGTGCCA TTAAGCCTTT 
ATGACATGTT CAAACTGCCA TGGGTGTACA GGTATCATCT 
CCAGATGCTT CAATTTGCTG TACAAAATGT TCATAAGTCT 
AACATTTCGT TAACTACAAC ATTTCTTGAT ACCGTCGTTT 
GCTAACCACT GCAGTTTAAC GTTTGGTACA AAATCAGGAC 
AACGTAAATC CTAAACGTGA TTTGTAACTT GGATGATACT 
AATTCATAGT CGTTAAATGT CTCAGGTGTT GCTGGTGGGT 
CTTTGCGTAT CTTTTAATTC TGTCTGTAAT AACTCGACAA 
TCATTTTTAG GAAATGTAAA TACAACCTCT CTCAATAATT 
TCTGCCTCAT CTCCTACGAC ACGCTCAATT GGTGATGTGA 
TGTGTCTTTT CAGCAGTAAA ACGATACTCT GAATCATGTC 
ACACCGTCTT GATATGACGC TTTATACACA ACAATATTCT 
AGTTGGTGCA TCACTCTAGT CTTTACACGA TTAAGAATTG 
CTCCTTGTTA TGACAAATTG GATTTGGTAT ATGTGTATAA 
ATTCAATTTA CTCATCAAAT TCG CTTTAGC CGcAATGGTC 
TACACAGTCA ACAAATACTG CGTTATTCGC GTATTCTTTT 
CGCTACAAGT TGCCATAACA CAACTTCATT TCTAGTCGCT 
ATGTCCTAAG TGATTTACTA CAACGTAATA TTTAAGACGA 
ATATACAACA GGGCTTGATG CTGCCACAAC ATTTGGCACA 
CGTTSITAGAT AGACAAATGC CTTCAAGATC TCTGACAAAG 
TTTTAATTCA ATTAATGTAT TTTGTACATG TGCTTCTAGA 
CAGCTTTAAT ATCGGCAATA ATGTACGATT CAAATAACAT 
TAAACCACTT TGCTCAATCA CTTGTGATAA CTTAGACATC 
AAATAATGAC GCCAATACAT GAATATCTTT ATCAGCATGG 
AACAATCATG GCACTATTTG TTAATAAATC CATTTCAGGT 
ATTCGGTAAC AATGCACGAT ATCCTTCTTC AAACATCAAT 
CTCATCTTTG ACTGATGCGA TAACTTGCGC GGCATCAATT 
GTCATTCGTA CGTATAAAAT TAGTGATTTT AACGTGTATC 



GTTGAGGGTG ATATAGCTCA 9840 

CTTCAGCCAA ATCAACTTGG 9900 

CAACATCATT TACATGTTTG 9960 

TATCGCCAAC TTGTTGACGT 10020 

CTACTTTATC TTTGTCGATA 10080 

CAAATTTCAA ATTATCACTC 10140 

GATGCCCTTC CATCGCATAA 10200 

TTGATTCTCG ATACTGCATA 10260 

TAAATTGTTC TAGCTTTTCA 10320 

GTGTATAGTC TGTTGTTGTA 10380 

TACGTATACG ATCAAAGCTA 10440 

CTTCTATTGT AAAATGACCG 10500 

CATAAATAAG TGATGATACC 10560 

TTTGATTCAC AATACGATAC 10620 

ATAGGGTTTG CACCACAATC 10680 

GGCGTTTGAT ATAAATCTTC 10740 

TTCCAAGTCA TAAGACGATG 10800 

TTACCAATAG TTGATACTAA 10860 

TGCCATGCTT CATCATGTGC 10920 

AGCTGTTTTT CAGTAGCAAT 10980 

CATACGTCGG GTATGCCATC 11040 

CTAATGCCTG TGTTACTAAA 11100 

TCAAG C CATG CTTCTGGTGC 11160 

GGTGAATCAG GCATCGTTTC 11220 

TAATTCGGTA TCCCTTCACG 11280 

TCAACTGTTT GCCCTAATGG 1134 0 

TTAAAATGGG GTGTTTCAAC 114 00 

GTCCGTTCAA TCTGTTCAAG 114 60 

GGTAATTTTA AATAAATGTT 11520 
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GCCAAGGTCT TTTATTAAAC CTTGTTCACT 

CACATTGATT TGATAAGGAT GTGTTGGTAA 

5 

ATCTATGTCT GCTAATTGAT ACAACACTTT 

GCGCGTGAGC AGAACATCTT GATGCACAGC 

TTCGGOTGCA TATTTCTCTA AATCTGCTTC 

10 

ATGAAATGGA TGACCTAAGT ATAAAGATTG 
GTCTATTGTG TTACTTTGCA AATAACGTGC 

y5 CATAATTTGC GCCATATGTT GTTGCACTGC 

TTGCAAAATA CGCGCAATTG CTTCTTTATA 
AAGCCATACC TCTGGATGAT ACATATGATG 

20 CGTTAAAGTT TCGAGCTCTG ATAATTGTAT 

ATATAAATTT TCTTCTCTAA AATATTCATT 

ATGTTGTATT AATTCTTTAT TTTGCACTTT 

25 

TGTGATCGTT GATTTGATTA GTGATGGTTG 

ATACTACGCC CATAACGATA AACGTAGTAG 

CACTaAGACT GCCAATAATT TGACCAACAA 

30 

TGCCTTTAAG TTGTTGATGA CACGCATTCA 
CACTATATGT TAATCCTTGA AGTATTCTTG 

35 AACCTTGCAG TATCGCACTA CAACCACATG 

CATATGATTT ATCATTAAAG CGTCCCCATA 
ATGCGGACTG TAAAAATCCA ATCACACTAC 

40 AAGCAAGTGG TGATAATGCA GTTAGCATGC 
CGATAATAAA TCGACATGTT TGTTGTGTGC 
CTTTATTAAT ATTTGGTGTT TGTGATTTTG 

45 

CACCGAAAAT ACAGACAATA AAAGTAATAA 
CTAATATCGA AGCTGTAACA CCGCCAATTA 
so AACTTTGCAG TCTTCCTAAT ACCTTTCCAC 
ACGCACTTGA TGCATCAACA ACACCACCAA 
ACTGTAATGG TGTCGTACAC AATGCCATTA 

55 



ATATTGCATA TACTGTGGAT GCTGTOGCAA 11640 

TAAAATAAAA TCTTTGGGTA TCTCTGATAT 11700 

CTCAACCTGA TCTTCTTTAC CTTCTACATA 11760 

TAAATAATGC AATTGGAATG ATGTATGACA 11820 

TGAAAACCCA CTTGCACTCT TAGGAGTCGG 11880 

TTCTGAAACG ATATAACGAT CCTCTACGTA 11940 

CGTGCGATGA ATGCTATTAT CGATGTCAGA 12000 

CGTTTGATTA TCTGCACTTT GAGCCATATG 12060 

AGTTGTTATT TTTTTACTTT TTC CATCGAT 12120 

CCCCATCGCA GACCAATAGC GAAATTCACC 12180 

AGACCATTGA TGATTTTGAG GTGGTACTTG 12240 

TAAAATGCGT TCGATAGCCG CATACGCTGC 12300 

TTTGTTTCAA CTCCCATAAT TTCATTAATG 12360 

AACAAATTAA AAATAAACTA CTTACTGCAA 12420 

CTGGTGTAGT ATAACTTGTA . ATGGCAGCGC 12480 

CTAACAT ACT GTTCGTCGTT CCAACAAATG 12540 

CGACAACAAA CATGACACTT TGAATCAATG 12600 

CAGCCATTAA AAACTCTATA TTCGTCGCTA 12660 

CAATCGTGGC AAATATATAT ACTGATTTAA 12720 

AAGGCGCGCT TAATATCGAA GCCGTCCAAA 12780 

GGTCATCTAT CGCTGTATGA TTCACTGATG 12840 

CATACATAGC AAAGTTTGCT AAAACGCCAA 12900 

ATAATAGACA TTGAAATGAA CGGCGAATAC 12960 

GCATATGTGT CGTTTCAATC AATTTTAATG 13020 

CGGCAATACT CATCAGTAAC GCACTT^AAAC 13080 

ATGGCCCCAC AAGAGACCCT GCGCTGACTG 13140 

GATCTTCAGC TGGCG CCTCT GCACTCGCAA 13200 

ATAGTCCCTG CAATAACCTC ACAAGTACAA 13260 

AAAATAAGCA TACCGCCAAA CCAAGTAACG 13320 
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CtATCATCGT 


CGTTACAGCT 


GG AG CAGCAA 


TCGCTATACC 


ACTCCACAAC 


TGTATTTCTA 


13440 


5 


CGACTGATAG 


ATTTTGTAGT 


GATGCCATAT 


AAATTGGCAA 


TAATGGCACA 


AGTACTGTCA 


13500 




GTCCAGCAAT 


CGCTATAAAC 


TGACTGAGCC 


ATAAAATGCG 


AAAGTTACTG 


CGCCATATAG 


13560 




ACTGATTAAT 


CATATGTCAC 


CATTGGATTT 


GGTACGGTAG 


TTAAACCTGA 


AGGCATACTA 


13620 


10 


CCTCCACCAC 


TATCACGTTG 


ATATAGCAAT 


GGTAATAAAA 


TTTGTTTGAA 


TGGCCACGTC 


13680 




TGTTTATCAA 


ATAAAATGTG 


TCTGACAGCT 


AG CTGATCAG 


TTGTAACCCA 


GGAAATAGTT 


13740 




GCCACTTCAT 


TTTTTAAAAT 


TTGTTTTAAC 


AACGACATAA 


GTTCATGCTC 


ACTTACACCA 


13800 


IS 


AATAAATCTT 


GAATTGCATC 


AATAATGGCA 


TATAGATTTA 


CCGATACAGC 


TAATGTTTGA 


13660 




AAATAAGCAA 


AGAATGTTTC 


CAAATCCTCA 


TTAATTAGCG 


TATTAGGTGT 


ATCTTCTCTG 


13920 


20 


ACGACATACT 


TCGGCAATGA 


AAGCTGATGT 


GCTGTTAGCC 


ATGGTTTATA 


AATTCTGACA 


13980 


GTATCATGAT 


CACGTAACAC 


GCATTTTTQT 


ACACGTCCAT 


CTTCAAATGA 


CAACAATATA 


14040 




TTTTGACCAT 


GCAACTCTGG 


TAATGCGCCG 


T ATTG CAT AA 


ATGATAGTGT 


TACCTTTAAA 


14100 


25 


AAGACTTGCG 


CGATATCTTC 


AAATAACGTC 


ATGACATCAT 


TTTTAGAAAT 


ATTATCTTTT 


14160 




CCACAAATCA 


TTTGATATAA 


AGTG CGATCA 


TTTGCCGCGA 


GTGCTGCCAT 


TGACACTAGC 


14220 




TGTTGGGTAT 


CATTTTTGGC 


TAGCACTTCG 


GGATACTTTC 


TTAGCTGAAC 


AGTTAGATGA 


14280 


30 


C CTAATTG AT 


CTTTGAAAAT 


ATCATTATCT 


TGACCCATAT 


ATGACCACCA 


AGCTGTTTCA 


14340 




TCACAAACCA 


TGACATACTT 


AGCTAGTGCT 


tcatcttttt 


CTATAAGCTG 


ACGTAATAAT 


14400 




TGTTCTGCTT 


GTTCTCCGTT 


TTTCATGTAA 


CGCGTAGGCG TTAGCCTTAA 


TGCGCCTAAT 


14460 


35 


GACTG GATTG 


CAAATGGTAC 


TTTGACATGG 


TTATACGGTG 


CGCCAATATC 


AATTAATGAA 


14520 




CGCATACTTG 


AAGACGACAG 


ATAATCTCCA 


AATTTTAACG 


GTAATAGTAC 


AACCAACTTT 


14580 


40 


TCACTAATCT 


CTTTCGCAAA 


GACGTTCGGC 


AGAATATGCT 


GATATTGCCA 


AGGATGTACC 


14640 


GGAAATAGTA 


CATAGTCATC 


TATTGATAAC 


CCTTGATCAT 


TTAACATGTC 


TGTCGCTTGT 


14700 




TCTTTTATAG 


GTACTGTCAA 


ATTTTCTAAT 


TCATCGATAT 


TTGCAGTATC 


GCCATGAATC 


14760 


45 


ATATGTGTCT 


TTTTAACTGC 


TGCAACCATT 


AAAGGAAATG 


ATTGATTTAA 


TTCAGCTTGA 


14820 




TACACTTGAT 


AATCCGCTTC 


TCTTAATCCT 


CTTTTTTCTT 


TAGCTAATGG 


ATGAAATGGA 


14880 




CGAT CTTTTA 


AACTTGCAAA 


CTGCTCTGAC 


ATCACAAAAG 


GATGTGACGC 


TAAATCTAAT 


14940 


SO 


TCTGATAATT 


GTTTAGCAAG 


CTGTGTGGCA 


GCAGTAGTCA 


GTCCTTCTTC 


AACGCGAGCC 


15000 




ACTTCCCATT 


CATGACTTAG 


AT CACAATTC 


ATATTAGCAA 


TTGTTTGCCA 


AAATTCAGCT 


15060 




GCCGTTAAAG 


GTTGCTTAGA 


CACCCTTCCC 


TCTATCGTAA 


TTGGTTGTGA 


ACTTTCGTAA 


15120 
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TATATCAAAA GCGTTTGTCC GTTTT CTTTA GTAATCTCAC TATTCGATAC AATTCCGGCT 15240 

ATATCTTCAA ATAATAATGC ATCAACTAAA TCTCTTAATA TTATCGCTTG TGCTGTATTG 15300 

5 

ACTGCTGTAT GATTCTGCAA TGTTCAGACA CCTCGCATTC TTAATATAGG TTCAATGTTG 15360 

TCCCAATATT TTGTTGTTGT GCCTGTTGAT AAATAAAATA AGCACTTGAA ATATCTTCGA 15420 

TAGCCATACC CATCGGATTA AGTAATATGA TCTCATCATC GTCTTCACGT CCTGGTATGT 15480 

10 

CACCTGTCAC AAGTTGTCCT AGTTCAGCAT GAAGAGCTTC TTTGCTGAAT TTACCTTCTA 15540 

ACACCAATTG GTTAATAGTT TT CTT T T CTC GATTACATTG TGACCAGTCA TCTACTACGA 15600 

IS CTTTGTCAGC TTTAATAAAG ACTTCTTTAT GCACATCCAT GATAGAAATG TTGCTAATAA 15660 

ATGCACCCTT TTGTAACCAA TCATATTCAA TGTATGGTTG ATCCGTTACG GTACATGTAA 15720 

TGACTACTTC ACCATTTGAT ACTGCTTCTT TAGCATTTTC TGTCGCAATA AAATTAATTT 15780 

20 

CCGGACGCTG TTGTTGCCAT CTATCAACAA AGCGTGCACA TGCTTCAGAG AATTGATCGT 15840 

AAACAAACAC GCGTTCAATA TGATCGAATT GCTCTAACAT ACTTTGTAAT TGCTTGTCTC 15900 

CGATTAGCCC GCATCCAATG ATTGTTAAGT CTTTAAATCC TTTTTTAGCC AAATGCTTTG 15960 

25 

CTGCAATCAC TGAAACTGCT GCAGTACGCA TACTACTAAT TAAACTTGCT TCCATAACTG 16020 

CAATTGGATA ATTCGTTTCT GGATCATTCA AAATAATGAC GCCACTTGCA CGCTCCATAT 16080 

30 TACGTTTCGA TGGATTGTCG TGCTTACTAC CTATCCACTT AATACCTGAA ATTGCGTGTT 1614 0 

CACCACCGAT ATGACTTGGC ATTGCAATAA TTCGATCTGC GATGTGTCCA TTTTCAGGAT 16200 

CCtGTCTTAA ATACGGCTTA AGCGGTTGTA CAAAATCATT GTGCGCATGG GCTGTTAATG 16260 

35 CTTCTGTTAA TGCGTCCACA TAAACTTGTG AATGATTACC TCCCGCTTGT TCAATATCTG 16320 

ATCTATTTAA ATACAACATC TCTCTatTCa TTCTGaTTTA ACTCCTTGTC TTGATTTCAT 16380 

TTTTTCTAAC CATGTATCTG AATAAACTAA ATCTAAGTAA CGATCGCCTC GATCTGGTAA 16440 

40 

AATCGTGACA ATTGTTG CAC CTTCTTCAAT TGACGTTATC AACTGCTCAA TCGCTGCAAT 16500 

AATCGAACCT GTTGAAcCTC CGG CAAAT AT GCCTTCATAA TCAATCAGTT TTCGACAGCC 16560 

CAAAGCAGAT TGATAATCAT CTACATGGAT CACTTGATTA ATTTCTGATC TATTCAATAT 16620 

45 

TTCGGGTACA CGACTAGGAC CG AT AC GAGG TAATTCTCTA TTAATAGGTT TGTCACCAAA 16680 

AATGACTGAC CCTTTCG CAT CAACAGCAAC AATTTGTGCG TTTGGATGCA CTTCTTTTAT 16740 

50 TTTTCTACTC ATACCCATAA TGCTACCTGT CGTGCTGACT GGCGCGACAA AATAATCTAT 16800 

AGGTTGCTTA ATTGTTTCAA CAATCTCTGT GCCTGCACCA TGATAATGGG ATTGCCAATT 16860 

TAACTCATTC GCATATTGAT TAATCCAATA TGCATCGTCA ATAGTGGCTA ACAGTTCTTG 16920 

SS 
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10 



1$ 



20 



TACATTGGCA CCATAACTTT TAATAATTTT CAAATTTGTT GGTGATATTT TAGGATCAAC 17040 

AACACACGTG AGTTTTAATC CCTTGATTTT AG CTATCATT GCCAACGCAA TGCCTAAATT 17100 

ACCAGAAGTA CTTTCAATTA AATGTGTATT CTCAGTGATT AAACCATGTT TAATACCATG 17160 

TTCAATGATG TACTTGGCAG GTCGATCTTT CATGCTGCCT CCAGGATTCA TATACTCTAA 17220 

CTTTGCAAAC ACTTCATGTT TCGGAAATAG TTGATGAAGT TGAAC CAT AG GTGTTTGCCC 17280 

TACAGAATCT AACAATGAAT CGTGCACATG 17310 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5423 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

ATACTAGTAA GCGCATCGGT TATTGACATC GAATTCAACT TTAACAGTTT TCATGTTCGG 60 

TGATGTTTCa ATAGAATGTG TGTGTTGTAC TTGCGCATTT ATATTT CCAC CTAAATTACT 120 

TAAGTTTCCT GTAATACTAG AAATGTCAGG TGCGTTTAAT GTAGGTTGAA ATGCATCAAC 180 

TACTTTATCT GCAACATTAG AAACATTACG GATAACTTTA CTTGAATGAT TAT CTATACC 240 

TTTAACGAAA CCTAACATTG AATACATACC AACATCCATG AATTCACGTG AAGGTGAGTG 300 

AATACCTAGC GCTCTTTTGG CTGCATTTAA AGCACCTTTT GCTACACTAG CTGCTTTTTC 360 

AGCTAAGTCT CTAGCCATAT TACCAATACC TCTCATCAAA CCACGGATCA TATCAGCACC 420 

TGCTGATACA AAGTCATCCA CAAAGCTTTT AACTTTATTT ACTGCATTTG TCATACCTTG 480 

ACTAACTTTG TTTACAACAT TAACGAATCC TTGAACAACT CTATTAACAA rGTTAATTAG 540 

CGTACtTGTt ATAGTAGATA CCCaTnGCAT ACCTTTAGTG ACmATGAAGT TCCAAGCTTG €00 

AGACATTTTG TCTGATATAG TTGAAACAAC TTGTGTGAAT ATGCTTACAA CTTTATTCCA 660 

AATTGTCGTT AATATACCAG ATAAGAAACT CCAAATCGTA TTCCATATAT TAGAAATAAA 720 

ACTCCATGCC GCTTGTAACG CAGTAGATAT AGCTGTAGTG ATAGCGTTCC AAACCTTAGT 780 

TGCCACAGTA ACTATAGTGT TCCACAACGT TTGTAAGAAC GTCCAAATAG CGTTCCAAAT 840 

TGTTATTGCG ATAGTCATAA TTGTGGTAAA CACTGTAGTT ATTACAGTGA CTAACAAATT 900 

CCAAATCGTA GTAGCGATTG TAATTATCGT ATTCCAGATT GTACTTAAGA ACGTCCAAAT 960 

AGCTGTCCAT ATCGTCATAA CTATTGTCAT TAT CGTCGTG AAAACAGTTG TAATGATTGT 1020 
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ATAAGCGACT ATTTGATTCC AAACAATCAT TATAAAATTG TAAACATTCG ATACTGCTGT 1140 

AGTGATAGCT GTTAAAATAG CATTCCATAC AACCGAAGCT ACAGCTTTTA ATACATTCCA 1200 

5 

AACATTAACC ATAAACGTTT TTATCGCATT CCAAGCATTT ATAATAAAGT TTCTGAATCC 1260 

TTCATTTTTA TTCCACAATA AAACGAATAT AGCTATTAAT GCAGCAATTA CACCAATTAC 1320 

1Q TATTGTTATT GGACCGCCTA AAATACCAAA CACAGTTACT AGTCCTGTGA TAGCATTTCT 1380 

AATTAATCCA ATCTTACCGA ATAACAATTG GAATATAACT GATATAATTT TTAATGGTCC 1440 

TTTTAATAAC ATGAACGCAC CTTTTAAAAT TGTTAATCCC GCTCTTAATA AACCGAACTT 1500 

15 ACTTACTAAT GCAATGrTTC TACCTATTAA TCCGCCACCC ATAAAGTTAG ATACAGCAAG 1560 

AATAATCGGT ATTAAAAATC TAAATGCACC AACTAAAGTT ATAATGACAC CAACTAATTG 1620 

TGCTGTAGCT GGATGCGCCT CAAACAAGTT AGCTATCCAA CCAGTTATTG CAACTGCAAC 1680 

20 GCGTAATACT GCACTAGCTA TAGGAGCCAT TGCTGTTGCG AATGCArmTA ATCCTCTTGC 1740 

GATGTTTCCA ATCAATTGCA TTATTAGTGG TCCATTTGTT TGTATATAAC TGACAAAGTC 1600 

TTTAAACCCT TGAGATTGTC CTACTTGTTC AGACCATTCC CTAAACTTAG CTGTCATTTG 1860 

25 

TTCAAGAGAT TGGAATATGC CAGTTGATGA TCCGCTGAAT GCATTCATCA AATTGTTAAT 1920 

TCCAACGAAA ACATTTTTGA AAATATTACC AATGATAGGT AAGTTTGTTT TTGTGTATTC 1980 

30 AATAAAACGA GTTATCGAAT TTTCTCCAGC TGCACTATTA GCCCAGTTAG AGAAAGATTG 2040 

ACCTAATCTA TCCAACCAAT CAGCCGACCA TTGAAACAGT GGTGCTAATT GCGTGAATAC 2100 

ATTGACTAAT CCGTCACCAA AACCACCTGC AGCACTTAAT AGCTTGTTAA ATACCGAAAC 2160 

35 ACCCGTTGTA TTCATCATAT TAAAGAATCT TGAAGCTACA CTGCTATTTT CAGCCCATTT 2220 

AAGCACGCTT TGAGACGCTT CTTCCATTCC TCTTGAAATA CCACTAAAAA ACGGTTGTAA 2280 

GCTCTGCATT GCAGTTTTAA CAGTATTTAA ACCATTTGCA AGAGTTGTGA AGATAGCGGA 2340 

40 - 

TTGATTTTGC TTTATAATAT CAGTCCATGC TGACTTTACG CCATCTAACG CTTTTTTGTA 2400 

TTCGTTTGTT GCTGAGCTAG CTTGTAAAGT GCCATCATTA AGCATCTTTA TAGCGCTGAT 2460 

AG C CATTGCG CCAAACGCTA CAAATCCTGC TCCCGCTATT GCTACGGCAC CACCTAAAGC 2520 

45 

AAGTACACCA CCAGTTAACA CTTTGATAGC GTTTAATAGC GCAAATACTA CAGGTACTAC 2580 

GCTCG CTATT ACAGGTATTA AGATACTAAA AGATGATGTA AGTAATCCAC CAAC CAT ATT 2640 

so AGAACCTACA GTACCGAACA CACGGAACAT ATTAGCTAAA TTCCCCATCT GTCTTTGAAA 2700 

ATTGTCATTT GCTTTTATTA TGTAGGCATA AGCTTTCTTT AAACCATTAG TATCGACATC 2760 

TACCTTTGTT GTTTTTTTGT TCGGCAATGC GTCTAATGAT TTTTTAAACG CATAAATAGT 2 820 

55 
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AAGTTCTTCT TTAGTACGTT TGATTTTAGA GTTAGCAACA CCATTGTCCA CGTCTATAAT 2940 

AGCTTTGGCT TTAGACCTAT TTAATGCTTC GAGACTAQCT TTAGATACTT TTAACACTCG 3 000 

5 

ATTGAATTTA CTGTTATCTG CATTGACGTC AATATTGACA CGTTTCTTTT CTAATTCTGA 3060 

TAATTTAGCT TCTGTTTCAG CGATATCTTT AATCAACTTT TGTTTTTGCA ACTTAACTTC 3120 

TGGTGTAACT TCTTTAGAGT TTAGTTTGTC TAGTTCAAAA TTCGATTCTA GTACCTTTTG 3180 

10 

TTGTAAATCT TGTATACTAG CATCTAATTT AGCTTTTACA TTTTTGTTAC TAAAGGCATC 3240 

TAAAGACTTT TTAGCAACTT TGATAGTTTT TTGTAAATTT TTATCGTTAG CGTTTAATTC 3 3 00 

15 AACATCTTTA GTTTGATCTG CTACTCGTTT AAATCTTTGC ACAGACTTAA CCGCACTATC 3360 

AATTTGCCTT TTGAATTTGG CTACACTAGC TTCAATAGTC GCTTTAATTT TATATTCCGT 3420 

CACATTAACA CCTCTCTTTC TATTGCTTAT TAAATTCTGC TATAACTTTA AAGAATTCAT 34 80 

20 TATTTTGTGG TTCGTATTCA TCACGTTCGC TACTAAATCT TATATCTTTA C CTTCGTT AA 3540 

GCCGTTGGAT ATTTTCTTCA TAAGGCAATA CGTCGTTTGC ATTGTTAAAA ACATATTCCT 3600 

CTTTAGGTTT ATTTTCTGTC CCAACATTTT TAGTAGCTGC AGCATCACGA ATAGCAAACG 3660 

25 

CAAGTTTGTA ACGTTCGAAT TCTTGGGTTA GCATTTCATA CTCTTTCGCA TACATTCGAT 3720 

AGTTATATTC TGTTAATGTC ATTTGCTCAA TAACGTTCAA ATCTGTAATA CCAAGTGTTG 3780 

ACATACAAGT TATAACGATT CTGTCGTAAG TTATTAGGcT TCCGCTGGTT TTTCTTCCGT 3 840 

30 

TTCCACTACT TCGACTAGGT TTCGGGTCAT AGGTCGCTTT CCCAAcTCCG TTAAAATATC 3 900 

CGAACGGAAT TCTTCTAGTC CGATATTTTC TGCGATTTCA TCTAATGCTT CATCAATGTT 3 960 

35 ATTAATAGTA ATTGCTTGTT TTTTTAAGTG AGATGTAGCT GCGATTAAAA cTTCGCCAAT 4 02 0 

CACAACCGGA TTTCCACTTT CTAAACCTAC AGGCAACATT GATACACCTT G AC CG AT AG A 4 080 

AG CTTGTTCA ACTTTTAAAC CTAAT CGGTT ATCGATTTCT CTTAAAAATT TAAAACCAAA 4140 

40 ACTTAATTCT AATGACTTTC CGTT AATTT C TACATTCATA ACTTAAAATC TCCATTCATA 4200 

ATTAATTTAA ACAAAATAAA mArGCTTAAC GCCCTATTTT TATACCTCTC TTGGTGCAAC 4260 

CGGTGGTGAA TCTACTTTAG GTTGTGGAAT TGCTGTTAAA TCTTCGCCAG TTAATGCATC 4320 

45 

TGCTTTTGTA GTGTCGTGGA ATCTGTATcC AGTCGCCTTA AGTTTCTTTG TTACAGCCTC 43 80 

AGGTAGTGTT GCAAAT CCAC GTTGGAAACG ACCATTCACT C CAT ATTCAT ATTCATATTC 4440 

so ATCAATACCG TTAGCTTCTG CTTTTAATTC AAATTTATTG TGGAAACCTT GGAAATATTT 4500 

CGCTTTAAAT TTAGCGGAAT CCCCATTTTT GCCTGGTATT CTACTTTCAA CTTCCCAAGC 4 560 

TT CAT ACAAT ACGCGATCTA CAACTGCATC TTCAATTTCA TCTGCAAAAT CGTCACCATA 4620 
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5 



20 



CTCCATTGTA 


TCCTCTGTAT 


CTGTATCAGC 


TTCATGTGAT 


AAGCCGTATT 


CAGTTAAAAA 


4740 


AAGCATTTTA 


GTAGCATCTA 


CTTTTTCGCC 


AGCTTTTCTA 


AATAAAATAA 


TACGATCATT 


4800 


ACTATTTTTC 


ATATTTGCCA 


TTCAATATTC 


CTCCGTTTTT 


TAAAATGTTT 


TGTAAGATAT 


4860 


CGTTACTGAT 


GTGTGTAGCA 


ATTCTTGATT 


GGTAGTATCA 


TCAACTAACT 


GTGTGATGTT 


4920 


AGTATCTTCT 


TCTTCAAAGT 


CATAATCGTT 


TGTTTTAACG 


CTAGGTGTTA 


AATCATCAAT 


4980 


ACATCTTTTA 


ACAAGTCCGT 


CATGATGTCC 


TAAATCATCG 


CTTACACTCC 


AAATATCAAT 


5040 


AACTAAATTC 


GTATCGCCAG 


AATAACTATC 


AAACGTGTAC 


TTACTTCTAT 


TTGACTCCGG 


5100 


CATTTTTATT 


ACAAAAAAAG 


GATACGGAAT 


CTCTTGTTGC 


ATCTCTTTAC 


GAGAAATAAC 


5160 


AGGGAATCCA 


TATCCTTGTA 


GCGTTTCATA 


CGCTTTATTA 


TAAAGTTGTA 


AGTTCGGTGT 


5220 


CATGCTTTTA 


TCTCCTATTC 


AAACAACGCT 


TTCAATTCTT 


CTACAGTTGA 


TTTCCTAATC 


5280 


ACTTCGTATA 


CCGGCCACAT 


AAAAGGTTCA 


GCCTCCATGT 


ATCGAGTACC 


AAATTCTAAG 


5340 


AAACCACTAT 


AAGCTGCGTG 


CGATGTGATA 


GTGTATTGCA 


AATCGCCAGT 


•rrri-rrATAT 


5400 


CTGATATTGC 


GTGATaAATT 


ACC 








5423 



25 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6251 base pairs 

(B) TYPE; nucleic acid 
30 (C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



35 ; (xi> SEQUENCE DESCRIPTION: i 

AAACGCAGAT GTTCAATTAG AACCAGTCTA 
AATACGAGAC CAAATTAGAC AAGCGTTAAA 

40 

TGAACTAAGA GAAAAATATA AATTAGAGAC 
TCCTAAAAGT AAAGAGGATT TATTACGTGC 
TTTATTCGAA TTACGTATGC AATGGCTAAA 

45 

TGAAATTGAT TATGACATAG ACCAAGTTAA 
AACTGAAGCA CAGAAATCCA GTGTTAATGA 
So TATGCATCGA TTACTTCAAG GTGATGTAGG 

TATGTATGCG TTAAAAACTG CTGGTTATCA 
AGCAGAGCAA CATGCTGAAA GTTTAATGGC 

55 



tEQ ID NO: 25: 

TCGTATTAAG GAAGGTATTA AACAAAAGCA 60 

TGATGTGACA ATTCATGAAT GGTTAACTGA 120 

CTTGGACTTT ACTTTGAACA CATTACATCA 180 

TCGTAGAACC TATGCATTTA CTGAACTGTT 24 0 

TAGATTAGAA AAGTCATCTG ACGAAGCAAT 3 00 

ATCATTTATT GATCGTTTAC CTTTTGAACT 360 

AATTTTTAGA GATTTAAAAG CACCAATACG 420 

TTCAGGAAAA ACAGTAGTTG CTGCAATTTG 4 80 

ATCAGCATTG ATGGTACCAA CTGAAATTTT 540 

TTTATTTGGA GATTCTATGA ACGTTGCATT 6 00 
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TACGATTGAT TGTTTAATTG GAACCCATQC TTTGATTCAA GATGATGTGA TTTTCCATAA 720 

TGTTGGTTTA GTAATTACAG ATGAACAACA TCGATTTGGT GTGAATCAAC GCCAGCTTTT 780 

5 

AAGAGAAAAA GGTGCAATGA CGAATGTGTT ATTTATGACA GCAACGCCGA TACCAAGAAC 840 

ACTAGCAATA TCAGTTTTTG GTGAGATGGA TGTGTCTTCA ATTAAACAAT TACCAAAAGG 900 

1Q TCGTAAACCT ATCATTACTA CTTGGGCAAA GCATGAGCAA TACGATAAAG TTTTGATGCA 960 

AATGACCTCA GAGTTGAAAA AAGGTCGTCA AGCATATGTC ATTTGCCCGC TAATAGAAAG 1020 

TTCTGAGCAT CTCGAAGATG TTCAAAATGT TGTCGCATTG TACGAOTCTT TACAACAGTA 1080 

16 TTATGGTGTT TCCCGTGTAG GGTTATTGCA TGGTAAGTTA TCTGCCGATG AAAAAGATGA 1140 

GGTCATGCAA AAGTTTAGTA ATCATGAGAT AAATGTTTTA GTTTCTACTA CTGTTGTTGA 1200 

AGTAGGTGTT AATGTACCOA ATGCAACTTT TATGATGATT TATGATGCGG ATCGCTTTGG 1260 

20 

ATTATCAACT TTACATCAGT TACGCGGTCG TGTAGGTAGA AGTGACCAGC AAAGTTACTG 1320 

TGTTTTAATT GCATCCCCTA AAACAGAAAC AGGAATTGAA AGAATGACAA TTATGACACA 1380 

AACAACGGAT GGATTTGAAT TGAGTGAA'CG AGACTTAGAA ATGCGTGGTC CTGGAGATTT 1440 

25 

CTTTGGTGTT AAACAAAGTG GaTTGCCAGA TTT CTTAGTT GCCAATTTAG TTGAAGATTA 1500 

TCGTATGTTA GAAGTTGCTC GTGATGAAGC AGCTGAACTT ATT CAATCTG GCGTATTCTT 1560 

3Q TGAAAATACG TATCAACATT TACGTCATTT TGTTGAAGAA AATTTATTAC AT CG TAGTTT 1620 

TGACTAATTG CCATG CTGAT TTGTCAATTT GAGTGCAACa CTTCGTTAAT TGAGTGATAT 1680 

GACACTTGAA CTATTTAAAT GTAAAGTGGT ATTTTAAGAA TTTATAAATT TT CGACTAAA 174 0 

35 TAATAGCTAA ATATTACAGT TATTTGTTGA GTCGGTTAAA TAGAAAGTGT TATGATATGT 1800 

GAGGAATGTT TAAGACTAGG TACTAAAAAA TGAGGGGTGA GACGTTGAAA CTAAAGAAAG 1860 

ATAAACGTAG AGAAGCAATC AGACAACAAA TTG AT AG CAA TCCCTTCATC ACAGACCATG 1920 

40 

AACTAAGCGA CTTATTTCAA GTGAGTATAC AAACAATTCG TTtAGaTCGC ACTTATTTAA 198 0 

ACATACCAGA ATTAAGGAAG CGTATTAAAT TAGTTGCTGA AAAGAATTAT GACCAAATAA 204 0 

GTTCTATTGA AGAACAAGAA TTTATTGGTG ATTTGATTCA AGTCAATCCa AATGTTAAAG 2100 

45 

CGCAATCAAT TTTAGATATT ACATCGGATT CTGTTTTTCA TAAAACTGGA ATTGCGCGTG 2160 

GTCATGTGCT GTTTGCTCAG GCAAATTCGT TATGTGTTGC GCTAATTAAG CAACCAACAG 2220 

50 TTTTAACTCA TGAGAGTAGC ATTCAATTTA TTGAAAAAGT AAAATTAAAT GATACGGTAA 22 80 

GAGCAGAAGC ACGAGTTGTA AATCAAACTG CAAAACATTA TTACGTCGAA GTAAAGTCAT 2340 

ATGTTAAACA TACATTAGTT TTCAAAGGAA ATTTTAAAAT GTTTTATGAT AAGCGAGGAT 24 00 

55 
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TTAGAAGCCG TACAAAAGGC TGTTGAAGAC TTTAAAGATC TAGAAATTAT ACTTTTCGGT 2520 

GACGAAAAAA AGTATAATCT GAACCATGAA CGAATCGAAT TTAGACATTG TTCTGAAAAG 2580 

5 

ATTGAAATGG AAGATGAGCC TGTTAGAGCG ATTAAACGTA AAAAAGATAG CTCAATGGTA 2640 

AAAATGGCTG AAGCTGTGAA ATCTGGTGAA GCAGATGGAT GTGTGTCAGC AGGTAATACT 2700 

GGTGCTTTAA TGTCAGCTGG TTTATTCATT GTTGGACGTA TTAAAGGTGT AGCTAGACCG 2760 

10 

GCTTTAGTAG TAACATTGCC AACGATTGAT GGAAAAGGTT TTGTCTTTTT AGACGTTGGT 2820 

GCAAATGCTG ATGCTAAACC TGAACACTTA TTACAGTATG CG CAACTAGG GGATATTTAT 2880 

75 GCTCAAAAAA TTAGAGGTAT TGATAATCCG AAAATCTCAT TATTAAATAT AGGAACCGAG 2940 

CCAGCTAAAG GTAATAGTTT AACGAAAAAA TCATATGAGT TATTAAATCA TGATCATTCA 3000 

TTGAATTTTG TTGGGAATAT TGAAGCGAAG ACATTAATGG ATGGCGATAC AGATGTTGTA 3060 

20 GTTACCGATG GCTATACTGG GAACATGGTC CTTAAAAATT TAGAAGGTAC TGCAAAATCA 3120 

ATCGGTAAAA TGTTAAAAGA TACGATTATG AGTAGTACTA AAAATAAATT AGCAGGTGCA 3180 

ATATTGAAGA AAGATTTAGC TGAATTCGCT AAAAAGATGG ATTACTCAGA ATACGGTGGT 3240 

25 

TCCGTATTAT TAGGATTGGA AGGTACTGTA GTTAAAGCAC ACGGTAGTTC AAATGCTAAA 3300 

GCTTTTTATT CTGCAATTAG ACAAGCGAAA ATCGCAGGAG AAGAAAATAT TGTACAAACA 3360 

ATGAAAGAGA CTGTAGGTGA At CAAATGaG TaAAACAGCA ATTATTTTTC CGGGACAAGG 3420 

30 

TGCCCAAAAA GTTGGTATGG CGCAAGATTT GTTTAACAAC AATGATCAAG CAACTGAAAT 3480 

TTTAACTTCA GCAGCGAACA CATTAGACTT TGATATTTTA GAGACAATGT TTACTGATGA 3 540 

35 AGAAGGTAAA TTGGGTGAAA CTGAAAACAC ACAACCAGCT TTaTTGaCGC aTAGTTCGGC 3600 

ATTATTAGCA GCGCTAAAAA ATTTGAATCC TGATTTTACT ATGGGGCATA GTTTAGGTGA 3660 

ATATTCAAGT TTAGTTGCAG CTGACGTATT ATCATTTGAA GATGCAGTTA AAATTGTTAG 3720 

40 

AAAACGTGGT CAATTAATGG CGCAAGCATT TCCTACTGGT GTAGGAAGCA TGGCTGCAGT 3780 

ATTGGGATTA GATTTTGATA AAGTCGATGA AATTTGTAAG TCATTAT CAT CTGATGACAA 3840 

AATAATTGAA CCAGCAAACA TTAATTGCCC AGGTCAAATT GTTGTTTCAG GTCACAAAGC 3900 

45 

TTTAATTGAT GAGCTAGTAG AAAAAGGTAA ATCATTAGGT GCAAAACGTG TCATGCCTTT 3960 

AGCAGTATCT GGACCATTCC ATTCATCGCT AATGAAAGTG ATTGAAGAAG ATTTTTCAAG 4020 

so TTACATTAAT CAATTTGAAT GGCGTGATGC TAAGTTTCCT GTAGTTCAAA ATGTAAATGC 4080 

GCAAGGTGAA ACTGACAAAG AAGTAATTAA ATCTAATATG GTCAAGCAAT TATATTCACC 4140 

AGTACAATTC ATTAACTCAA CAGAATGGCT AATAGACCAA GGTGTTGATC ATTTTATTGA 4200 

£5 
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10 



1$ 



20 



2S 



30 



35 



40 



45 



50 



AACATCAATT CAAACTTTAG AAGATGTGAA AGGATGGAAT GAAAATGACT AAGAGTGCTT 4320 

TAGTAACAGG TGCATCAAGA GGAATTGGAC GTAGTATTGC GTTACAATTA GCAGAAGAAG 4380 

GATATAATGT AGCAGTAAAC TATGCAGGCA GCAAAGAGAA AGCTGAAGcA GTAGTCGAAG 4440 

AAATCAAAGC TAAAGGTGTT GACAGTTTTG CGATTCAAGC AAATGTTGCC GATGCTGATG 4 500 

AAGTTAAAGC AATGATTAAA GAAGTAGTTA GCCAATTTGG TTCTTTAGAT GTTTTAGTAA 4560 

ATAATGCAGG TATTACTCGC GATAATTTAT TAATGCGTAT GAAAGAACAA GAGTGGGATG 4620 

ATGTTATTGA CACAAACTTA AAAGGTGTAT TTAACTGTAT CCAAAAAGCA ACACCACAAA 4680 

TGTTAAGACA ACGTAGTGGT GCTATCATCA ATTTATCAAG TGTTGTTGGA GCAGTAGGTA 4740 

ATCCGGGACA AGCAAACTAT GTTGCAACAA AAGCAGGTGT TATTGGTTTA ACTAAATCTG 4800 

CGGCGCGTGA ATTAGCATCT CGTGGTATCA CTGTAAATGC AGTTGCACCT GGTTTTATTG 4860 

TTTCTGATAT GACAGATGCT TTAAGTGATG AGCTTAAAGA ACAAATGTTG ACTCAAATTC 4 920 

CGTTAGCACG TTTTGGTCAA GACACAGATA TTGCTAATAC AGTAGCGTTC TTAGCATCAG 4 980 

ACAAAGCAAA ATATATTACA GGTCAAACAA TCCATGTAAA TGGTGGAATG TACATGTAAT 5040 

AT ATTTGAG C TAAAGCTCAT TGACGCAGTG GTTGACTGGT CATCCAATGG AGAATTGTCT 5100 

GACCTAGTCA ACTTTGCGGG GGAAATTCTA AGCAACCTAG ATAAGGTTCC AGAATTTCT C 5160 

CCTAAGAAAC ACTAATCAAT aAATTGwTAA GTGTTTCTAA AATTTCTACT TGTTTTTTAG 5220 

AATTTAAAAT GGGAAAATAT AGTAGTCTAT GT ATAGG CAT TTTTAAAGGA GGTGAATCGA 5280 

CGTGGAAAAT TTCGATAAAG TAAAAGATAT CATCGTTGAC CgTTTAGGTG TAGACGCTGA 534 0 

TAAAGTAACT GAAGATGCAT CTTTCAAAGA TGATTTAGGC GCTGACTCAC TTGATATCGC 54 00 

TGAATTAGTA ATGGAATTAG AAGACGAGTT TGGTACTGAA ATTCCTGATG AAGAnGCTGA 5460 

AAAA£*TCAAC ACTGTTGGTG ATGCTGTTAA ATTTATTAAC AGTCTTGAAA AATAATAAAT 5520 

CTTACATCTG GGTCGTCAGT ATTGTCGACT CAGTTTTTTT CTTTAATTAT CAATAGTTTT 5580 

AACGTAAAAT TAAAGATGAT TCAAGAGCAA CACATAAAGG AGATAAAATA ATGTCTAAAC 5640 

AAAAGAAAAG TGAGATAGTT AATCGTTTTA GAAAGCGCTT TGATACTAAA ATGACAGAGT 5700 

TAGGCTTTAC TTATCAAAAT ATTGATTTAT ACCAACAAGC ATTTTCGCAT TCGAGTTTTA 5760 

TTAATGATTT TAATATGAAT CGTTTAGACC ATAATGAGCG TTTAGAGTTT TTGGGTGATG 5820 

CGGTATTAGA ATTGACGGTT TCACGATATT TATTTGATAa ACATCCCAAC TTGCCAGAAG 5880 

GGAATTTAAC AAAAATGCGT GCCa CTATTG TATGTGAGCC CtCACTkGTA ATATTTGCGA 5 940 

ATAAAATTGG ATTGAACGAA ATGATTTTAC TTGGTAAAGG TGAAGAGAAA ACAGGGGGAC 6 000 
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ATCAAGGACT AGATATAGTT TGGAAATTTG CTGAGAAAGT CATTTTCCCA CATGTAGAAC 6120 
AAAATGAGTT ATTAGGCGTG GTAGATTTTA AAACACAATT CCAAGAATAT GTGCACCAGC 6180 

£ 

AAAATAAAGG TGATGTAACC TATAATTTAA TAAAAGAAGA GGGACCGGCA CATCATCGTC 6240 
TATTCACTTC A 6251 
(2) INFORMATION FOR SEQ ID NO: 26: 

W 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4920 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

is (D) TOPOIXX3Y: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 26: 

20 

ACCTACTGAA GTTGCTAATT TTTTGGAGCA ACTAAGCACT GAAATTGAAC GTCTTAAAGA 60 

AGATAAAAAA CAACTTGAAA AAGTAATCGA AGAGAGaGAT ACTAATATTA AGTCTTATCA 120 

AGACGTGgCA TCAATCTGTA AGTGaTGCTT TGATACAAGC TCAAAAAGCT GGTGAAGAAA 180 

25 

CTAAGCAAGC TGCAGAGAAA CAAGCTGAAG CGATTATAGC TAAGGCAGAA GCGCAAgcTA 24 0 

ATcAAATGGT TGGTGACGCG GTAGAAAAAG CACGCCGTTT AGCATTCCAG ACTGAAGATA 300 

3Q TGAAACGTCA ATCAAAAGTA TTTAGATCGC GTTTCCGTAT GTTAGTTGAA GCGCAATTAG 360 

ACTTATTAAA AAACGAAGAT TGGGATTACT TGTTGAATTA TGATTTAGAC GCTGAACAAG 420 

TGACGCTTGA AAATATTCAT CATTTGCATG AAAATGATTT AAAGCCAGAT GAAGTTGCAG 480 

35 CAAATGCACA AAATAATGCA TCAAATACAC CAGACAATAA TCAACAATCC AATGATTCAG 540 

AAACAACTAA GAAGTAAGAA TTAAATAAAG ACAGACGCGT AATATACATT TAACTTTTCA 600 

CAGCGAATTA GGTAATGGTG AGAGCCTAGT AAAAGCATGT ATGTTATATC ACTGGCTTTT 660 

40 

TAATATTTAA ATAATGTAAT GAGAGAACTC TAAGTTGAGT TAATAAGGGT GGTACCGCGA 720 

GCAATCGTCC CTTTTAATTT AACTTAGAGT TTTTTAAATT TTTAAGGAGT GAAAAAAATG 780 

GATTACAAAG AAACGTTATT AATGCCTAAA ACAGATTTCC CAATGCGAGG TGGTTTACCA 840 

AS 

AACAAGGAAC CGCAAATTCA AGAAAAATGG GATGCAGAAG ATCAATACCA TAAAGCGTTA 900 

GAAAAAAATA AAGGTAACGA AACATTCATT TTACATGATG GCCCACCATA CGCGAATGGT 960 

SO AACTTACATA TGGGACATGC CTTGAACAAA ATTTTAAAAG ACTTTATTGT ACGTTATAAA 1020 

ACTATGCAAG GGTTCTATGC ACCATACGTA " CCAGGTTGGG ATACACATGG TTTACCAATT 1080 

GAACAAGCAT TAACGAAAAA AGGTGTTGAC CGAAAGAAAA TGTCAACAGC TGAATTCCGT 1140 

ss 
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TTAGGTGTTC GTGGTGACTT TAATGATCCA TATATTACAT TAAAACCTG A ATACGAAGCT 1260 

GCACAAATTC GTATTTTTGG AGAAATGGCA GATAAAGGTT TAATTTATAA AGGTAAAAAG 1320 

5 

CCAGTTTATT GGTCTCCTTC AAGTGAGTCT TCATTAGCAG AAGCAGAAAT TGAATATCAC 1380 

GATAAACGTT CAGCATCAAT TTACGTTGCA TTTGACGTTA AAGATGACAA AGGTGTCGTT 1440 

GATGCAGATG CTAAATTTAT TATCTGGACA ACAACGCCAT GGACAATTCC ATCAAATGTT 1500 

10 

GCGATTACCG TTCATCCTGA ATTAAAATAT GGTCAATACA ATGTAAATGG CGAAAAATAT 1560 

ATTATTGCAG AAGCCTTGTC TGACGCTGTA GCAGAAGCAC TGGaTTGGGA TAAAGCATCA 1620 

15 ATCAAATTAG AAAAAGAATA CACAGGTAAA GAATTAGAGT ATGTTGTAGC ACAACATCCA 1680 

TTCTTAGACA GAGAATCGTT AGTGATTAAT GGTGATCATG TTACTACAGA TGCTGGTACA 1740 

GgTTGTGTAC ATACAGCACC AGGTCACGGG GAAGATGACT ATATTGTTGG TCAAAAATAT 1800 

20 . GAATTGCCAG TAATTAGTCC AATCGATGAT AAAGGTGTAT TTACTGAAGA AGGCGGCCAA 1860 

TTTGAAGGGA TGTTCTATGA TAAAGCTAAT AAAGCCGTTA CTGATTTATT AACAGAAAAA 1920 

GGTGCACTAT TAAAATTAGA CTTTATTACA CATAG CTATC CACACGACTG GAGAACAAAA 1980 

25 

AAACCTGTAA TCTTCCGTGC TACAC CACAA TGGTTTGCCT CAATCAGTAA AGTAAGACAA 2040 

GATATTTTAG ATGCAATCGA AAATACAAAC TTCAAAGTAA ATTGGGGTAA AACACGTATT 2100 

3Q TACAATATGG TTCGTGACCG TGGCGAATGG GTTATTTCTC GTCAACGTGT GTGGGGTGTA 2160 

CCGTTACCAG TATTTTATGC TGAAAATGGC G AAATT AT CA TGACGAAAGA AACAGTGAAT 2220 

CATGTTGCTG ATTTATTTGC AGAACACGGT TCAAATATTT GGTTTGAAAG AGAAGCGAAA 2280 

3* GACTTACTAC CAGAAGGATT TACACATCCA GGCAGCCCTA ACGGTACATT TACTAAAGAA 234 0 

ACAGACATTA TGGACGTTTG GTTTGATTCT GGTTCATCAC ACCGTGGCGT GTTGGAAACA 2400 

AGACCGGAAT TAAGTTTCCC AGCGGATATG TATTTAGAAG GTAGTGACCA ATATCGTGGT 2460 

40 

TGGTTCAACT CTTCTATCAC AACTTCAGTT GCTACAAGAG GAG TATCAC C TT AT AAATT C 2520 

TTACTTTCTC ATGGTTTTGT TATGGACGGT GAAGGTAAGA AAATGAGTAA ATCTTTAGGT 2580 

AATGTGATTG TACCTGACCA AGTGGTTAAA CAAAAAGGTG CTGATATTGC GAGACTTTGG 264 0 

45 

GTAAGTAGTA CGGACTATTT AGCTGATGTT AGAATTTCTG ATGAAATTTT AAAACAAACA 2700 

TCTGATGTTT ATCGTAAAAT CAGAAATACA TTAAGATTTA TGTTAGGTAA CATTAACGAT 2760 

50 TTCAATCCTG ACACAGATAG CATTCCTGAA TCAGAGTTAT TAGAAGTGGA TCGTTACTTG 2820 

CTAAAT CGTT TACGTGAATT TACTGCAAGT ACGATTAACA ACTATGAAAA CTTTGACTAC 28 80 

TTAAATATTT ATCAAGAAGT TCAAAACTTT ATCAATGTTG AGTTAAGTAA TTTCTATTTG 2940 
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CAAAPArtTCT 


Inln 1 V—/-VMJ-1 X 


TTTAGTTGAT 


ATGACGAAGT 


TGTT AG CACC 


AATCTTAGTG 


3060 


s 


pbtr r* a rinv? 

\-J-kX nWVVlU X\ar 


a 21 p a a p tt^tv* 

AAvaAAVj ± 1 1 


rt-TCATACA 


CCACATGTTA 


AAGAAGAAAG 


TGTTCACTTA 


3120 




ftp a rz a r*a Tfir* 




AVjAAvSTAGAT 


CAAGCTTTAT 


TGGATAAATG 


GCGTACATTT 


3180 




Aiu/vil X X/\L. 




LfAACCGTGCA 


TTAGAAACTG 


CTCGTAATGA 


AAAAGTTATT 


3240 


10 


WlnnnlLni 


1 AoAAbL 1 AA 


AGTTACGATT 


GCTAGTAACG 


ATAAATTTAA 


TGCATCTGAA 


3300 




*1~1"LT±"1'AACTT 


CATTTGATGC 


ATTACAT CAA 


TTATTTATCG 


TGTCACAAGT 


TAAAGTTGTA 


3360 




GATAAGTTAG 


ACGATCAGGC 


AACAGCTTAT 


GAACATGGTG 


ATATTGTCAT 


CGAACATGCA 


3420 


IS 


GATGGTGAAA 


AATGTGAAAG 


ATGTTGGAAC 


TATTCAGAGG 


ATCTTGGTGC 


TGTTGATGAA 


3480 




TTGACGCATC 


TATGTCCACG 


ATGCCAACAA 


GTTGTAAAAT 


CACTTGTATA 


ATTGAAATTG 


3540 


20 


TATAAAGTAC 


TCATACAGAT 


GATATAAATT 


AAAGCTCTCT 


TCATAATCAT 


GTTGTAGTTT 


3600 


TTGTTGACAT 


GATGAAGAGA 


GTTTTTTTGT 


GAATAAAAAA ATGACCAAGT 


TACCGGTCAT 


3660 




ATATGTAAAA 


AATGTGCGAT 


TTACTAAAAT 


AAAAATTATT 


CAGGAATGGT 


ACAAATTCTC 


3720 


25 


TGAGGCATAT 


AAATGCGTTA 


TAGTTGCTAT 


TCTCAATTAT 


GTTCGCGATA 


ATTTTAAGTA 


3780 




AAAGTAAGCA 


CAGATATTGA 


ATTTGATAGG 


AGTTAATTGA 


ATGTATCATA 


ACAGTAACGC 


3840 




AAACTTTGTC 


AATGGTATCA 


CTTTAAATGT 


GAGAGATAAG 


AATGAATTAA 


AGC CATTTTA 


3900 


30 


TGAGGACATA 


TTAGGATTAA 


ATATTATAAA 


TGAGACATTA 


ACATCGATAC 


AATATGAAGT 


3960 




AGGTCAAAAT 


AATCATGTCA 


TTACACTTGT 


TGAATTACAA 


AATGGACGTG 


AACCTTTAAT 


4020 




GTCCGAAGCG 


GGACTGTTTC 


ATATCGCAAT 


TAAACTACCT 


CAAATTAGTG 


ATTTAGCTAA 


4080 


35 


TTTACTAATT 


CATTTAAGCG 


AATATGATAT 


TCCAGTTAAC 


GGAGGTATAC 


AGCCTGCTTC 


4140 




GTTATCATTA 


TTTTTTGAAG 


ACCCGGAAGG 


AAACGGTTTT 


AAATTTTATG 


TTGATAAAGA 


4200 


40 


CGAAGCGCAA 


TGGACGAGGC 


AAAATAATTT 


AGTAAAAATT 


GATATTAGAC . 


CATTAAATGT 


4260 


AC CGAGATT A 


GTGAGTCATG 


CAACAAAATT 


GTTATGGTTA 


GGTATTCCAG 


ATGACGCTAT 


4320 




1A I AGGx CiCJA 


TTG CATATT A 


AGACAATTCA 


TTTATCAGAG 


GTAAAAGAGT 


ACTACCTCGA 


4380 


45 


TTATTTTGGA 


TTAGAGCAAT 


CGGCATATAT 


GGATGATTAT 


TCAATATTTT 


TAGCATCGAA 


4440 




TGGCTATTAT 


CAACATTTGG 


CCATGAATGA 


TTGGGTATCA 


GCAACGAAAC 


GTGTAGAAAA 


4500 




TTTTGATACG 


TATGGATTAG 


CAATTGTTGA 


CTTTCATTAT 


CCTGAAACAA 


CACATTTAAA 


4560 


SO 


TTTACAAGGT 


CCGGATGGTA 


TCTATTATCG 


CTTTAATCAT 


ATCGAAGTTG 


AAGATTAGTA 


4620 




TATACTTTGA 


ATGGACGAAC 


CATATAATGA ATCGTTTTTA ATGATCTTTT 


TATACAAGTT 


4680 




ATGAAGGAGG 


CTGGGACATT 


AAGTTCTTAG 


GCAATGTAAA 


AAGCTGATTT 


CTATTAATTA 


4740 
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w 



20 



30 



TTTTCCTTAT ATTAATTGCC ATTAATACAA AACCTAGCTC TCGTTTAACT TTATTTATTC 4860 
CTCGAACTGA CATTCGnGTG AACTCAAAAT nGCCTACTTn CTTAAATTAC CAATATCTAT 4920 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 626 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



IS (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

TGGATTGCCA TTACATGGAC AAGATTTAAC TGAATCAATT ACACCATATG AAGGTGOTAT 60 

CGCTTTTGCA AGTAAACCAT TAATTGATGC TGATTTTATT GGTAAATCTG TATTAAAAGA 120 

TGAAAAAGAA AATGGTG CAC CAAGAAGAAC AGTGGGATTA GAATTACTTG AAAAAGGAAT 180 

TGCAAGAACT GGTTATGAAG TTATGGATTT AGATGGAAAT ATTATTGOAG AAGTAACTTC 240 

AGGAACACAG TCTCCATCAT CAGGAAAATC AATTGCACTT GCAATGATAA AAAGAGATGA 300 

GTTTGAAATG GGTAGAGAGT TGCTTGTTCA AGTTCGTAAG CGTCAATTAA AAGCGAAAAT 360 

TGTTAAGAAA AATCAAATTG ATAAATAATT AAAAAGGGGT GTG CATTGTG AGTCATCGTT 420 

ATATACCTTT AACTGAAAAA GACAAGCAAG AAATGTTACA AACAATTGGT GCAAAATCTA 4 80 

TAGGAGAATT ATTCGGTGAT GTACCAAGTG ACATTTTATT AAATAGAGAT TTAAATATTG 54 0 

CTGAAGGCGA ACGGAGAACA ACGTTACTTA GAAGATTnAA TCGCATTGCA AGCAAGAGTA 600 

35 TCACTAGAGG AACGCGTACA TCGTTT 626 

(2) INFORMATION FOR SEQ ID NO: 28: 

T(i) SEQUENCE CHARACTERISTICS: 
^ (A) LENGTH: 1126 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

45 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
nGGAAGTGGT GTATATATTT GTAATGAGTG TATTGAATTA TGCTCAGAAA TCGTCGAAGA 60 
SO AGAATTAGCT CAAAACACTT CTGAAGCGAT GACAGAATTA CCTACTCCTA AAGAAATTAT 120 
GGATCATTTA AACGAATATG TTATTGGTCA AGAAAAAGCT AAAAAATCTT TAGCTGTAGC 180 
TGTTTATAAC CACTATAAGC GTATTCAACA ATTAGGACCA AAAGAAGATG ATGTTGAATT 24 0 

£5 



303 



EP0 786 519 A2 



5 



10 



20 



25 



AACCTTAGCC 


AAGACGTTGA 


ATGTACCATT 


TGCAATTGCA 


GATGCGACAA GTTTAACTGA 


360 


AGCTGGTTAT 


GTAGGCGATG 


ATGTTGAAAA 


TATCTTGTTG 


AGATT AATTC 


AAGCAGCTGA 


420 


CTTTGACATT 


GATAAAGCCG 


AAAAAGGTAT 


TATTTATGTA 


GATGAAATTG 


ATAAAATTGC 


480 


ACGTAAATCT 


GAAAACACAT 


CTATAACACG 


TGACGTTTCA 


GGTGAAGGTG 


TTCAACAAGC 


540 


ATTGCTTAAA 


ATCTTAGAAG 


GTACGACTGC 


AAGTGTTCCG 


CCACAAGGTG 


GACGCAAACA 


600 


TCCAAACCAA 


GAAATGATTC 


AAATTGATAC 


AACAAATATC 


TTATTTATTC 


TTGGTGGTGC 


660 


CTTTGATGGT 


ATTGAAGAAG 


TGATTAAGCG 


CCGTCTTGGT 


GAAAAAGTTA 


TTGGTTTCTC 


720 


AAGCAATGAA 


GCTGATAAAT 


ATGACGAACA 


AG CATT ATTA 


GCACAAATTC 


GCCCAGAAGA 


780 


TTTGCAAGCC 


TATGGTTTGA 


TTCCTGAATT 


TATCGGAOOT 


GTGCCAATTG 


TAGCTAATTT 


840 


AGAAACATTA 


GATGTAACTG 


CGTTGAAAAA 


CATCTTAACG 


CAACCTAAAA 


ATGCACTTGT 


900 


GAAACAATAT 


ACTAAAATGC 


TGGAATTAGA 


TGATGTGGAT 


TTAGAGTTCA 


CTGAAGAAGC 


960 


TTTATCAGCA 


ATTAGTGAAA 


AAGCAATTGA 


AAGAAAAACA 


GGTGCGCGTG 


GTTTAOGTTC 


1020 


AATCATAGAA 


GAATCGTTAA 


TCGATATTAT 


GTTTGATGTG 


CCTTCTAACG 


AAAATGTAAC 


1080 


GAaGGTAGTT 


ATTACAGCAC 


AAACmATTAA 


TGrAGaACTG 


AACCAG 




1126 



<2) INFORMATION FOR SEQ ID NO: 29: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4392 base pairs 

(B) TYPE: nucleic acid 
<C) STRAND EDNESS : double 
(D) TOPOLOGY: linear 

35 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 29: 

ATTGACTTCT TAGCAATnAA TaTGAGTGAA GAACGTACTG TTGAAGTACC AGTTCAATTA 60 

40 * 

GTTGGTGAAG CAGTAGGCGC TAAAGAAGGC GGCGTAGTTG AACAACCATT ATTCAACTTA 120 

GAAGTAACTG CTACTCCAGA CAATATTCCA GAAGCAATCG AAGTAGACAT TACTGAATTA 180 

AACATTAACG ACAGCTTAAC TGTTGCTGAT GTTAAAGTAA CTGGCGACTT CAAAATCGAA 240 

45 

AACGATTCAG CTGAATCAGT AGTAACAGTA GTTGCTCCAA CTGAAGAACC AACTGAAGAA 300 

GAAATCGAAG CTATGGAAGG CGAACAACAA ACTGAAGAAC CAGAAGTTGT TGGCGAAAGC 360 

SO AAAGAAGACG AAGAAAAAAC TGAAGAGTAA TTTTAATCTG TTACATTAAA GTTTTTATAC 4 20 

TTTGTTTAAC AAGCACTGTG CTTATTTTAA TATAAGCATG GTGCTTTTTG TGTTATTATA 4 80 

AAGCTTAATT AAACTTTATT ACTTTGTACT AAAGTTTAAT TAATTTTAGT GAGTAAAAGA 54 0 
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CTTACTAAGC TAAAGAATAA TGATAATTGA TGGCAATGGC GGAAAATOGA TGTTOTCATT 660 

ATAATAATAA ATGAAACAAT TATGTTGGAG GTAAACACGC ATGAAATGTA TTGTAGGTCT 720 

5 

AGGTAATATA GGTAAACGTT TTGAACTTAC AAGACATAAT ATCGGCTTTG AAGTCGTTGA 780 

TTATATTTTA GAGAAAAATA ATTTTTCATT AGATAAACAA AAGTTTAAAG GTGCATATAC 840 

1Q AATTGAACGA ATGAACGGCG ATAAAGTGTT ATTTATCGAA CCAATGACAA TGATGAATTT 900 

GTCAGGTGAA GCaGTTGCAC CGATTATGGA TTATTACAAT GTTAATCCAG AAGATTTAAT 960 

TGTCTTATAT GATGATTTAG ATTTAGAACA AGGACAAGTT CGCTTAAGAC AAAAAGGAAG 1020 

15 TGCGGGCGGT CACAATGGTA TGAAATCAAT TATTAAAATG CTTGGTACAG ACCAATTTAA 1080 

ACGTATTOGT ATTGGTGTGG GAAGACCAAC GAATGGTATG ACGGTACCTG ATTATGTTTT 1140 

ACAACGCTTT TCAAATGATG AAATGGTAAC GATGGAAAAA GTTATCGAAC ACGCAGGACG 1200 

so 

CGCAATTOAA AAGTTTGTTG AAACATCACG ATTTGACCAT GTTATGAATG AATTTAATGG 1260 

TGAAGTGAAA TAATGACAAT ATTGACAACG CTTATAAAAG AAGATAATCA TTTTCAAGAC 1320 

CTTAATCAGG TATTTGGACA AGCAAACACA CTAGTAACTG GTCTTTCCCC GTCAGCTAAA 13 80 

25 

GTGACGATGA TTGCTGAAAA ATATGCACAA AGTAATCAAC AGTTATTATT AATTACCAAT 144 0 

AATTTATACC AAGCAGATAA ATTAGAAACA GATTTACTTC AATTTATAGA TGCTGAAGAA 1500 

30 TTGTATAAGT ATCCTGTGCA AGATATTATG ACCGAAGAGT TTTCAAGACA AAGCCCTCAA 1560 

CTGATGAGTG AACGTATTAG AACTTTAACT GCGTTAGCTC AAGGTAAGAA AGGGTTATTT 162 0 

ATCGTTCCTT TAAATGGTTT GAAAAAGTGG TTAACTCCTG TTGAAATGTG GCAAAATCAC 1680 

35 CAAATGACAT TGCGTGTTGG TGAGGATATC GATGTGGACC AATTTCTTAA CAAATTAGTT 174 0 

AATATGGGGT ACAAACGGGA ATCCGTGGTA TCG CATATTG GTGAATTCTC ATTGCGAGGA 1800 

GGTATTATCG ATATCTTTCC GCTAATTGGG GAACcAATCA GAATTGAGCT ATTTGATACC 1860 

40 

GAAATTGATT CTATTCGGGA TTTTGATGTT GAAACGCAGC GTTCCAAAGA TAATGTTGAA 1920 

GAAGTCGATA TCACAACTGC AAGTGATTAT ATCATTACTG AAGAAGTGAT CAGCCATCTT 198 0 

AAAGAAGAGT TAAAAACTGC ATATGAAAAT ACAAGACCCA AAATAGATAA ATCAGTGCGC 2040 

45 

AATGATTTGA AAGAAACGTA TGAAAGCTTT AAATTATTCG AAAGTACATA CTTTGATCAT 2100 

CAAATACTAC GTCGCTTAGT AGCGTTTATG TATGAAACAC CTTCGACAAT TATTGAGTAT 2160 

SO TTCCAAAAAG ATGCAATCAT TGCAGTTGAT GAATTTAATC GTATTAAAGA AACTGAAGAA 2220 

AGTTTAACAG TAGAGTCTGA TTCGTTTATT AG CAAT ATT A TTGAAAGTGG TAATGGATTT 2280 

ATAGGACAAA GTTTTATAAA ATATGATGAT TTTGAAACAT TGATTGAAGG CTATCCTGTC 2340 
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TCATGTAAAC CTGTCCAACA ATTTTATOOG CAATATGACA TTATG CGTTC TGAATTTCAA 2460 

CGATATGTTA ATCAAAACTA TCATATCGTG GTTTTGGTCG AAACCGAAAC TAAAGTTGAA 2520 

5 

CGTATGCAAG CGATGTTAAG TGAAAtGCAT ATTCCATCAA TAACAAAATT GCATCGCTCA 2580 

ATGTCATCGG GGCAAGCAGT GATTATTGAA GGCAGTTTAT CTGAAGGATT TGAACTACCT 2640 

GATATGGGAT TAGTTGTCAT TACTGAGCGT GAqcTTTTTA AATCAAAACA GAAAAAGCAA 2700 

10 

CGAAAACGTA CGAAAGCTAT CTCAAATGCT GAAAAAATTA AGTCTTACCA AGATTTAAAT 2760 

GTGGGAGATT ATATTGTTCA TGTGCATCAT GGTGTTGGTA GATATTTAGG TGTTGAGACG 2820 

IS CTCGAAGTGG GGCAAACGCA TCGTGATTAT ATTAAATTGC AATATAAAGG TACGGATCAA 2880 

CTATTTGTTC CAGTAGATCA AATGGATCAA GTTCAAAAAT ATGTAGCTTC GGAAGATAAG 2940 

ACGCCAAAAT TAAATAAACT CGGTGGCAGT GAATGGAAAA AAACAAAAGC TAAAGTTCAA 3000 

20 CAAAGTGTTG AAGATATTGC TGAAGAGTTG ATTGATTTAT ATAAAGAAAG AGAAATGGCA 3060 

GAAGGTTATC AATATGGGGA AGACACAGCT GAGCAAACAA CATTTGAATT AGATTTTCCA 3120 

TATGAACTTA CGCCTGACCA AGCTAAATCT ATCGATGAAA TTAAAGATGA CATGCAAAAA 3180 

25 

TCGCGTCCAA TGGATCGCTT GCTATGTGGT GATGTTGGTT ATGGTAAAAC TGAAGTTGCA 324 0 

GTGAGAGCAG CATTCAAAGC TGTAATGGAA GGAAAGCAGG TTGCATTTTT AGTTCCTACA 33 00 

30 ACTATTTTAG CTCAGCAACA TTATGAGACG TTAATTGAGC GTATGCAAGA TTTTCCTGTT 336 0 

GAAATTCAAT TAATGAGTCG TTTTAGAACG CCTAAAGAGA TAAAACAAAC TAAGGAAGGA 3420 

CTTAAAACTG GATTTGTTGA CATAGTTGTT GGTACACACA AATTACTTAG TAAAGATATA 34 80 

35 CAGTATAAAG ATTTAGGGCT GTTGATTGTA GATGAAGAAC AACGATTTGG TGTACGCCAT 3540 

AAAGAGCGTA TTAAAACATT AAAACATAAT GTAGATGTAC TAACATTGAC TGCAACCCCA 3600 

ATAGCTAGAA CATTGCATAT GAGTATGCTA GGTGTGCGGG ATTTGTCAGT GATTGAAACG 3660 

40 

CCGCCAGAAA ATCGTTTCCC AGTTCAAACA TATGTATTAG AACAGAACAT GAGTTTTATC 3720 

AAAGAAGCTT TAGAAAGAGA ACTATCCCGT GATGGCCAAG TGTTTTATCT TTATAATAAA 3780 

GTGCAATCCA TTTATGaAAA ACGAGAACAA CTCCAGATGT TAATGCCAGA TGCTAACATT 3840 

45 

GCAGTTGCTC ATGGACAAAT GACAGAGCGC GATTTAGAAG AAACGATGTT AAGTTTTATC 3 900 

AATAATgAAT ATGATATTTT AGTAACGACG ACGATTATTG AAACAGGTGT CGATGTCCCA 3960 

50 AATGCAAATA CTTTGATCAT TGAAGATGCA GATCGCTTTG GATTGAGTCA GTTGTATCAA 4 020 

TTAAGAGGTC GTGTTGGTCG TTCAAGTCGT ATTGGTTATG CATACTTCTT ACATCCAGCA 4080 

AATAAGGTAC TAACTGAGAC TGCAGAAGAT CGATTACAAG CGATTAAAGA ATTTACGGAG 414 0 
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10 



15 



TTAGGTAAAC AACAGCACGG CTTTATTOAT ACAGTTOOAT TTGATTTGTA CAGTCAAATG 
TTAGAAGAAG CTGTAAATGA AAAACGTGGT ATTAAGGAAC CAGAATCTGA GGTGCCAGAA 
GTCGAAGTTG ATTTAAACTT GGATGCATAT TTGCCAACAG AATATATTGC AAATGAACAA 
GCTAAAATTG AA 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 729 base pairs 

(B) TYPE: nucleic acid 
<C> STRANDEDNESS : double 
(D) TOPOLOGY: linear 



4260 
4320 
4380 
4392 



20 



25 



30 



35 



40 



45 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

TGA ATCTATATCG AGGTGGTTGG TAGGTTCATC TAAAATAAGT ACATTGTCAC 60 

GTTGCAACAT AAGTAGTGCT AGTTGTAAAC GTGCTTTTTC ACCACCAGAT AAATCATTAA 120 

TTATCTTTTT AACATCGTCT TGTACAAATA AGAAACGTCC AAGAACTGCT CGAATATCTT 180 

TTTCATTCAT TAACGGATAT TGATCCCACA CATAATCTAA AATCGTTTTA CTAGATTTAA 240 

ATTCTG CTTG CTTTTGATCA TAATAACCAA TTTGTAAATT TGCGCCGAAA GTAATATCGC 300 

CATTAAGCGC TTTTTGTTGA TTAGCAATAG TTTTAATTAA GGTCGATTTT CCAATACCAT 360 

TTGGCCCAAT GATTGCTATA TGATCGCCTT TAGAGACCTC TATACTCATA GGTTTGGTAA 420 

TTGCAGTTTG ATAACCGATT TCTAAATTTT TT ACATG CAT GACGTCATTA CCTGTATTCC 480 

GGTCAAAGCC AAATTGAATA TTTGCACTTT TGGCATCTAA CATTGGTTTA TCAATGCGTT 54 0 

CCATTTTTTC TAAAATCTTA CGTCTACTTT TTGCCATTCC ACTTGTTGAA GCACGGGTAA 600 

TATTTTTCTC AACAAAAGTT TCTAATCGTT TTATTTCTGC TTGTTGACTT TCATATTCTT 660 

GCATTCGTTT TTGATAATAT AAATCCCGTT G CTGTAT AAA TTCCTCGTAA TTACCAACAT 720 

AGCGTTTGA 729 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13856 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



55 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
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TGATGTTTCG ATACATTTGT TGCACCTTGT 
TCCTTACTAT CTTTAGCTTC AGATTCCTGT 
5 TGTCCTTCAA TATCAACTCG TGGAATAATG 

CCTTTtCCAA ACAATTTCGt TAATGCAGGA 
AAGAGTACAC CAAACGCTAA TGCCATACCC 

TO 

ACAAACGCAA AGAAGACACT AAACATAATT 
TCTTTCAATC CTACTTTGAT AGAATAATCA 

t£ ATTCGCGACA TAAGGAAGAC TTCATAATCC 

ATAACCGGTA AAAATGCTAG CATTGGTCCT 
AAACCATCTT GCATTACTAA TGTTGTAAAT 

20 CCTAAAACTG CTTTTAATGG TATTAGAATT 

GCTAATACAA CAATGACTGA GGCAAATAAA 
TCAATATTAA TGACACTTTG TCCCGAAATC 

25 

TCTTTATGAT AATCTCGTAA ATCATGCACT 
TG CTTAGGT A TCACGACCAT CAAAGCGTAA 
ACGATATCTA CATTTTTCTT ATCTTTAATA 

30 

AATCCTTGTG GATCATCCTT TTTATCTTTC 
AATCCTTCAC CAAATTTATC CGAGATAATA 
35 GGTTTAACAC CGTCATCTGG AATACCAAGT 

ACTAATATGA TTAAACCTAG TAATACTGCC 

CATGGCGTAT CAATATCTTT TTTGAATTTA 

40 ' 

TGGAAAATGC TTATTAATGC AGGTAATAAA 

CTAATTGCCG AAGCAAATCC CATTACCGCT 

CATACTGCAA TTACAACTGT TACACCAGCA 

45 

GCAAGACCAA TGCCTTTAAT GTAATCTGTT 
AAAATAAATA ATGCATAATC GATACCAACT 
so GTGACATTTG GTATATCGAA TG CATAAGTT 

AGACCAATCA ATGCACTTAT AATTGGTAAT 
AACAGTACAA CAAATGCAAC AATAATACCA 

65 



GGATATACTT TAAAGGTTGT GTCGTATGTT 120 

GATTCAACCG TTTTATATTT TTCAAGTGCA 180 

CGATTCAACC ATGCTGGTAA ATAC CACGAA 240 

ATTAACATCA TtCTGACTAC GAAGGCATCA 300 

ATTGATTTAA TCATGACATC TTCTTGGAAT 360 

AATGCAGCTG CTACAATAAC AGGACCGCTT 420 

TTATCCCCTG TTTTACTATm yyCTTCATGr 4 80 

ATCGCTAATC CAAATAAGAT ACCTATAGTA 54 0 

GTCGTTTCAA TACCAAACAG ACCTTTCATA 600 

CCTAATGTTG CCATTAATGA CAAGACGAAT 660 

GAACGGAAGA CAATCATTAA TAAGAAAAAT 720 

GGTATCGCCT CATTTAACTT TTTAGACATA 780 

TCCGTTTTGA ACCCATATTT ATCTTGTGCA 840 

AAATCATTTG TACTCTCTGC ATTAGGCCCT 900 

TCATTATCTT TACTCATTTG TGGTGGCGTA 960 

TCTTTATATA CAGACTGTAA ATCTTGTTGT 1020 

ACATTTATCA ACATCGGTAT TTGGCCATTA 1080 

TCGTAAGCTT TTTTCTGTGT AGAATCTGCT 114 0 

CGCATATGAC TAACTGGTAT TGCAGCTGCT 1200 

GCAAGTGCAT TTCCTGTAAT AAATTTAGAC 1260 

GACTGTAATT TATTCACTTT AATGCGTTtA 1320 

GTTAAAGCGC TAAGTACTGC AAAAACAACA 1380 

AAGAAGTCAA TGCCTACTAA TGATAAACCA 144 0 

AAAACAACTG CACTACCTGC TGTTCCTATT 150 0 

TCAGTTTTCA TAACTTGTCG ATATCTGAAT 1560 

GCTAGTCCAA TCATTACGGC TAATGTCAGT 1620 

AACAAACTGA TAATACCTAC ACCAGAGGCT 1680 

CCTGCAGCAA TGACTGAACC GAATGTGATT 174 0 

ACTAGTTCAG AATTACCGCC TACTTCTGTA 1800 
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AAATGACTTT TAACATTATC TCTAGAGCCA TCTTTTAAAG ATGTTTGACT AACGTCATAT 1920 

GTGATATCTG CAAATGCAGT TGTTTTATCT TTACTAATTT GCTTATTTTC ATAAGGATCT 1980 

S 

GATATTTTAT CAATGTGCTT GTCATCTTTT TTAATATCAT CTAACGTTTT CTTAATATCT 2040 

TTAGTAATGT TCGGTTGCAC AATACCATCA TCTTTAGTCG TCTTAAAGAC AACACGTATT 2100 

TGTGCCTTTT CACTATCTTG ATTAAAATGT TTTTCAATCT TTTTATTCGT ATCTAACGAC 2160 

10 

TCTAATCCTG TCATTTTAAT ATCATTGTCA AATTTCGGTG CATTTGTAGC AAGTGGTATC 2220 

AATATTGCAG CTACAATCAC TATCCATGCA ATOACCGCGG ACCATTTATG TTTTGCGATG 2280 

75 AATGTCCCCA TCTTATATAA AAATTTTGCC AAAGTATATT GCCTCCTTTT AAAATCAACG 2340 

TTATAGTTTA AATATACAGT GTAGATTATT GTTCGATTAT AGTATCTATC CCCGACCTCT 2400 

TAAAGAATCA ATTGGAAAAT TTTGTATATT AAACTACACA CAAAGGAGAA ATGTAGATGA 2460 

20 AAGAGACTGA TTTACGAGTT ATAAAGACAA AAAAAGCATT GTCGAGTAG C TTGCTACAAT 2520 

TGTTAGAACA GCAATTATTC CAAACGATTA CTGTCAATCA AATTTGCGAC AACGCACTCG 2580 

TACACCGTAC AACATTTTAT AAACATTTTT ATGATAAATA TGATCTTCTA GAGTACTTGT 2640 

25 

TCAATCAATT GACTAAAGAC TACTTTGCTA GAGATATCAG TGACCGTCTT AATCATCCAT 2700 

TCCAAACGAT GAGTGATACG ATTAATAATA AAGAGGATTT GAGAGAAATC GCAGAATTCC 2760 

3Q AAGAAGFAAGA CGCTGAATTT AATAAAGTAT TAAAAAATGT CTGCATTAAA ATTATGCATA 2820 

ACGATATCAA AAATAATAGA GACCGTATCG ATATTGACAG CGACATCCCA GATAATCTCA 2 880 

TATTTTATAT TTATGACTCG TTGATTGAAG GTTTTATACA TTGGATAAAA GATGAAAAAA 2 94 0 

3S TTGATTGGCC TGGCGAAGAT ATTGATAACA TTTTCCATAG ATTAATCAAT ATTAAGATTA 3000 

AATAGTAGAT GAGAAACTCA TGAGCGTTAC CAACATTCAT AATAAAAACG ATAGTGkACA 3060 

CGTTAATGAA TTCGTGTACT ACTATCGTTT TTTATTTTTA TCGTGCTTAT CGCTATTAAA 3120 

40 

ACAACTGATA CACAACACAT AAACTATGAA GAAAAAAATA AATCCGCTAT CTAAATGACT 3180 

TTGACTCAGT TGTTTAAATG ACCAAATTGC TAATACAATT CCCATTATTA TTGAAATAAC 324 0 

GTATCT CACA TTCTTATACC TATAATCCTT TTCTAAAAAT ATGGTTGCTA TTACTTAATT 3300 

45 

TTTAAAGTTA TAAATAAAAA GAGCCAACCG CAATGGATGG CCCTTGTTCA TTATGAAGCA 3360 

TTAGAACATT TCTGAAACAA CCTTTTGTTC TAAGAAGTGT AATAAGTAGT CTGGACTACC 3420 

50 TGTTTTAGCG TCCGTACCTG ACATTTTGAA ACCACCAAAT GGATGGTATC CAACAACTGC 34 8 0 

TGAAGTACAG CCTCTGTTAA GGTATAAATT GCCTACATCA AATTCGTTTA CCGCTTTAAT 354 0 

CCAATGCTCG CGATTATTTG TAATCACTGC ACCAGTTAAA CCGTAATCTG TATCATTTGC 3600 
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TTCTTCTTGC ATQATTCTAT CTTTAGATTT AAGTCCTOAA ATGATTGTTG GTTCTACAAA 3720 

GTAACCTTTT GAATCATCAG TGCCGCCACC TTGTTCTAAT TTACCTTCTT CTTTACCAAT 3780 

CTCAATATAA TTTTTAATCT TATCAAATTG TTTTTTATTA ATAACTGGGC CCATATACGT 3840 

ATTGTCTACA GTATTGCCCA ACGTTAATTC TTTTGTTAAT TTGATTGATT TCT CTAATAC 3 900 

TTCGTCATAA ACGTCTTTAT GCACAATTGC ACGTGAACAT GCTGAACATT TTTGACCAGA 3960 

AAAACCAAAT GCTGACGTTA CAATAGCTTC TGCTGCCATA TCTGTATCAA TATTTTCATC 4020 

AACTACAATG GCATCTTTAC CACCCATTTC AGCGATAACA CGTTTCAAGA AGTTTTGACC 4080 

1S TTCTTGAACA ACGGCACTAC GTTCATAAAT TCTAGTACCT GTOGCACGTG ATCCTGTAAA 4140 

TGTAACGAAA TGCGTATCTT TATGATCAAC TAAGTAATCA CCAATTTCTT TCGGATCACC 4200 

AGGAACAAAG TTAACTACGC CTTTTGGTAA TCCTGCTTCT TCTAAAATTT CCATTAATTT 4260 

20 ATAAGCGATA TAAGGTGTAT CCTCAGCAGG TTTCAATAAC ACTGTATTAC CTGCCACAAC 4320 

TGGTGCTAAA GTTGTACCAG CCATAATCGC AAACGGGAAG TTCCACGGCG GAATTGTAAC 4380 

ACCTGTACCA ATTGATTTAT AGAAATATTT ATTGTGTTCA CCTTCACGAT CAAGTACTGG 4440 

CTTACCTTGA GCCAAGTCCA TCATTGAACG TG CAT AGT AT TCAATAAAAT CAATACCTTC 4500 

AGCTGCATCA CCAACTGCTT CATCCCATGG CTTACCTGCT TCATAAACCA TAATTGCTGC 4560 

AATTTCCGCT TTTCGACGAC GAATAATTGC CGAAACACGT AACATAAGCT CTGCACGATC 4620 

ATTTG CTGAC CATGTTTTCC AAGATTTATA AGCTTCGTTT GCTGCTTTAA ACGCATCTTC 4680 

AACATCTTGT TTTGTTGCCT TTGATGCATT TG CAATCACT TGTGATGTGT CTGCAGGATT 4740 

35 GATTGATTTA ATTTTGTCAT CTTTGAAAAT CTTCTCTCCA TTAATCACTA ATGGTATGTC 4 800 

TTGACCTAAT TCTTTTTCCA CGTCTTTCAA TGCTTTCTTA AACATATCCA CATTTTCTTG 4860 

GACTCAAAAA TCGTAACCAG GTTCATTTTT AAATTCTACT ACCATGTACA CTTACCCCCT 4920 

ATAAATTTTG AAAGTGGTTT AACCCTTTGA TTTAATGATA TAACATCATT TAAACTCATT 4 98 0 

TTACTATGAT TAAGGTTAGT TTTGCAATCG CTTTCATTTT TATGTTTTAT CACTTATTCT 5040 

CAAGTATTTT GAAATTGATT GGTTACTTTT TAAAATTTAT ATGGGTCGCA ACTGCTACTT 5100 

TATCGTTTCG TCATTTAATG TTTCGGATGG TAGGTCATTA TCAATTTTAC GAACGACTTT 5160 

ACAAGGGTTT CCAACCGCTA AGCTGTGTGG CGGAATATCT TTAGTGACAA CACTACCAGC 5220 

so AC CAATCAC A CTGCCTTCTC CAATCGTCAC CCCTGGTAAC ACGGCTACAT GACCGCCAAA 5280 

CCAAGTATTA CTGCCAATAT GAATGGGTCC GGCTTTTTCA AAACCTTCAT TTCTATGATG 534 0 

GAAATTAAGT GGATGTGTCG CTGTGTAGAA TCCACAATTA GGTCCTATAA AAACATTATC 54 00 
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TCCTAGTTTA ACGTTCCAAC CATAATCTGT ATCAAAAGGA ATCGAAATAC TTACATTGTC 5520 

TGTTGTTGTT TGAAATAATT GATCAATTAA TTCCTTTCTT TTATTTGTAG CACTCGGTCT 5580 

5 

TGTATGATTT AATTCAAAGC AAATATCTTT CGCTCGTGCA CGTTCATTGA TTAAGTATTG 5640 

ATCAAAGTTT GCATCGTACC ATTTTTCTGC TAACATTTTT TCTTTTTCAG TCATTACACC S700 

TTTCAACTCC TAATAACTTA TTTACTTGTT TAAAAGTTAA TCAAATAAAC CTTCGCCTAT 5760 

10 

GCAACTAATA CGCTATAACA TTATGAAATC ATGACCTTAT CACCCTTATC TATACAATTC 5820 

TCGCATCAAA TACTG CTAAA GTAGTAGATA AATTCAATAC TACAGACGCA TTCATTTTTT 5880 

t5 AATCTATTAA CGTACAATGT GAGTAAGAGA AATATAAAGG AGTATGATAG CGATGAGAAT 5940 

ATTAATTACA GGCACAGTTG CTATCTTAAT CATTCTAGGT TTGGTCAAAA CGATACAAGA 6000 

TTACGAAATG ACAAACGACA CGAGTCGTcA GTTGTCAGAC AACAAAGATG ATGATAAAGT 6060 

20 , CATCCATCTT AATAATTTTA AAAATTTACA TGCGAAAGAA TTTAACCCAT CTGATTTCTT 6120 

TTAAGTCACC TAAGAATTGC AAATCCAGAA GTCATTTAAG TTTTACCTTT CATTCATACA 6180 

TCCTTTAATA TTAATTACGA CTTCTTTTAT ATAGATGCTA AGTAGAGAGA TTGTTGTGCA 6240 

26 

ATGTTTGCAC GGCAATCTCT CTTTTTCTTT TTAAAATTGG TAAAAGTAAA ACGCAACGAT 6300 

TGACTTATAT ACCTATAGGG GGTACATTAG ACGTGTAACA ATGAATCACA GGGAGGCAAT 6360 

30 AATGTGGCTA ATACGAAAAA AACAACATTA GATATCACTG GTATGACTTG TGCCGCATGT 6420 

TCAAATCGTA TCGAAAAGAA ACTGAATAAA CTTGATGACG TTAATGCCCA AGTGAATTTA 64 80 

ACTACAGAGA AAGCAACTGT TGAGTATAAC CCTGATCAAC ATGATGTCCA AGAATTTATT 654 0 

35 AATACGATTC AACATTTAGG TTACGGTGTC GCTGTAGAAA CTGTCGAATT AGACATTACA 6600 

GGTATGACTT GTGCTGCATG CTCAAGCCGT ATTGAAAAAG TGTTAAATAA AATGGACGGC 6660 

GTTCAAAATG CAACGGTCAA TTTAACAACA GAGCAAGCTA AAGTTGACTA TTATCCTGAA 6720 

40 

GAAACAGATG CTGATAAACT TGTCACTCGC ATTCAAAAAT TAGGTTATGA CGCGTCTATT 6780 

AAAGATAACA ATAAAGATCA AACGTCACGC AAAGCTGAAG CGCTACAACA TAAATTGATT 684 0 

AAGCTTATCA TATCAG CAGT ATTATCTTTA CCACTATTAA TGTTAATGTT TGTACATCTT 6900 

45 

TTCAATATGC ATATACCAGC ACTATTTACG AATCCATGGT T C CAATTT AT TTTAGCTACA 6960 

CCTGTACAAT TTATTATTGG ATGG CAATTT TATGTAGGTG CTTATAAAAA CTTAAGAAAT 7020 

60 GGTGGCGCCA ATATGGATGT ACTTGTTGCT GTTGGTACAA GTGCAGCATA TTTTTACAGT 7080 

ATTTATGAAA TGGTTCGTTG GCTAAATGGC TCAACAACGC AACCGCATTT ATACTTTGAA 714 0 

ACAAGCGCCG TACTAATTAC CTTAATCTTA TTCGGTAAGT ATTTAGAAGC TAGAGCGAAG 7200 
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TTAAAAGATG GTAATGAAGT QATGATTCCT CTAAATGAAG TACATGTTGG AGATACACTT 7320 

ATCGTTAAAC CAGGTGAAAA GATACCTGTT GATGGCAAAA TTATTAAAGG TATGACTGCC 73 80 

ATCGACGAAT CTATGTTAAC AGGTGAATCT ATCCCTGTTG AGAAGAATGT TGATGATACT 7440 

GTAATTGGTT CAACGATGAA CAAAAACGGT ACTATTACTA TGACAGCAAC AAAAGTTGGC 7500 

GGGGACACTG CGTTGGCAAA TATTATTAAA GTTGTCGAAG AAGCTCAAAG TTCTAAAGCG 7560 

CCGATTCAAC GATTGGCAGA TATTATTTCT GGTTATTTCG TTCCTATCGT TGTTGGTATC 7620 

GCACTATTAA CATTTATCGT GTGGATTACT TTAGTTACAC CAGGTACATT TGAACCTGCA 7680 

1S CTTGTTGCGA GTATTTCCGT TCTCGTCATT GCTTGTCCAT GCGCATTGGG ACTTGCTACA 7740 

CCAACTTCTA TTATGGTAGG TACTGGTCGC GCTGCTGaAA ATGGTATTTT ATTTAAAGGT 7800 

GGCGAGTTTG TTGAACGCAC ACATCAAATT GATACCATCG TTTTAGATAA GACGGGTACC 7860 

20 ATTACAAATG GTCGTCCAGT CGTGACAGAT TATCATGGTG ACAATCAAAC GCTACAACTA 7920 

CTTGCTACTG CTGAAAAAGA TTCTGAACAC CCATTGGCAG AAGCCATTGT CAATTATGCA 7980 
AAAGAAAAGC AATTAATATT AACTGAGACA ACAACATTTA AAGCAGTACC TGGCCATGGT '' 8040 

2S 

ATTGAAGCAA CGATTGATCA TCACCATATA TTGGTTGGTA ACCGTAAATT AATGGCTGAC B100 

AATGATATTA GCTTGCCTAA GCATATTTCT GATGATTTAA CACATTATGA ACGAGATGGT 8160 

AAAACTGCTA TGCTCATTGC TGTTAATTAT TCATTAACTG GTATCATCGC AGTGGCAGAT 8220 

30 

ACTGTCAAAG ATCATGCCAA AGATGCTATA AAACAATTGC ATGATATGGG CATTGAAGTT B280 

GCCATGTTAA CTGGCGATAA TAAAAACACT GCTCAAGCCA TTGCAAAACA AGTAGGCATA 8340 

35 GATACTGTTA TTGCAGATAT TTTACCAGAA GAAAAAGCTG CACAAATTGC GAAACTACAG 8400 

CAACAAGGTA AGAAGGTTGC GATGGTTGGT GACGGTGTAA ATGATGCACC TGCATTAGTT 8460 

AAAGCTGATA TCGGTATCGC CATTGGTACA GGTACAGAAG TTGCCATTGA AGCAGCTGAT 8520 

40 ATTACTATTC TTGGTGGOGA CTTGATGCTT ATTCCTAAAG CCATTTATGC AAGTAAAGCA 8580 

ACCATTCGTA ATATTCGTCA AAATCTATTT TGGGCATTCG GCTATAATAT TGCCGGTATC 8640 

CCTATAGCTG CATTGGGCTT ACTTGCGCCA TGGGTTGCTG GTGCTGCAAT GGCACTAAGT 8700 

TCAGTAAGTG TTGTCACAAA CGCACTTAGA TTGAAAAAGA TGCGATTAGA ACCACGCCX3T 8760 

AAAGATGCCT AGATTCCTTA ATAATGAAGG ATTCGTTGGT GATTCTGAGA TAGGCTAGTG 8820 

ATTGGCTCTA TAATGTCGCG GTTTAyaGTt GGATCTTCGC TCCAACTGCA TATATAGTnA 8880 

CACTTTTCGC TTGGCGAATT AGTGTATCTT ACCTAATAGc TCCGCCTATT AGGTTCCATC 894 0 

ATTATTATAA ATAATAAGTA CACTACGGtT TA'CAGTTGGA TCTTCGCTCC AACTGCATAA 9000 
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GAAATTTTAA ATGTTGAAGG TATGAGCTGT GGTCACTGCA AAAGTGCTGT TGAATCTGCA 9120 

TTAAATAATA TTGACGGTGT CACTTCAGCT GACGTTAACC TTGAAAATGG TCAAGTAAGT 9180 

5 

GTTCAATATG ATGACAGTAA AGTTGCTGTA TCTCAAATGA AAGACGGAAT TGAAGATCAA 9240 

GGTTAOGATG TCGTTTAATT AGGCAATATT CAACGTCATC AACACCAAAT TAAAAAATCG 9300 

AACTGATGAG AATCCCAACA ATCCAAATTA TCTCATCAGT TCGATTTTTA ATTTACTCGT 9360 

10 

AACCTAGTAT CTCCAGTCTG CAATACATCT AATGTTGCAT CTAATGCATC GACAATTAGA 9420 

TTTTTAACTG CAGCTTCAGT ATAAAACGCA ATATGTGGTG TTAATATGAC ATCTTCCCTG 948 0 

IS TCAATCAACG ATTCTAACAA TGGATCGTTC AGTGTTTTGC CCCTTTGATC ACTTGGGAAA 954 0 

AGTTTGCGTT CAAATTCATA CGTATCAAGT GCTGCACCTT TAATCACACC ATTGTCTAAT 9600 

GOGTCTAATA ACGCCTTAGT ATCTACTAAA GAACCTCTCG CACAATTGAC AAATACTGCG 9660 

20 CCCTTTTTAA AATGTTTAAA TAATTCAGCA TTAAATAGAT AATGATTATA TTTCGTTGCA 9720 

GGTACATGTA ATGTCACGAT ATCAGCACCT TCAACCGCTT CCTCAATCGT ATCTTTGTAA 9780 

TCGACATACG TTGCAATTTT AGCATTAGGA AACGGtCGTA TGCGACCACA TCACTTTGAT 9840 

25 

AACCATTGGC AAATATATCG GCTACTACAC GGC CAATTCG ACCTGTACCA ATAACAGCTA 9900 

CTTTTAAATC TTTAATGGAT TTCGATAAAA TAGTAGGTTC CCATCTAAAA TCATGcTCCC 9960 

GCACTTTCGT TTGAATTTGA TTAAAATGAC GAACCACATT AATAGCCTGG TTCACAGCAA 10020 

30 

ACTCCGCAAT TGAATTCGGA GAGTATGACG GCACATTTGA CACAATAAAG TTATACTTGT 10080 
TTGCTAACTC CAAATCATAT GTATCAAATC CAGCACTACG TTGTGCGATT TGTTTAATAC - 10140 

35 CTAGTTCATT TAATCGTTTA TAAACATGCT CTGATAATGG TATTTGTTGT GATAGCGATA 10200 

AGCCATCATA AC CAGCGACA CCTTCAACAT TGTCATCAGT TAATGCTTCT TTAGTAATAT 10260 

CTACCTCAAC ATGATGTTTC TCTGCCCACG CCTTGATATA AGGCATATCT TCATCACGTA 10320 

40 - 

CACTCATGAT TTTAATTTTT GTCATTTTAA CATCACCCTT AACTTTATTA TTCATATAAA 10380 

TATGCTAGTT CTGTTAATCT TATTGCAGCT TCGTCTAATT TCTGGTCATC TAACGCCAAT 10440 

GAAATTCTCA CATAACGATT ACCATTCTCT CCAAATGGTT TCCCTGGAGC AACAAGTATT 10500 

45 

GACTTCTCTT GCACTAAAAA TTGCTCAAAT TGCTCGCTGT CAT AAC CAGG CGGTGTTTCC 10560 

AACCATACAT ATATGCCACC TTTAGCATGA ACAAATGGCA AATCAGCTTT TGCAAGCATG 10620 

so GCTTCGAATC GGTCACGACG TGTTTTAAAT ACATTGCTTT GTTCTTCTAA AAAATCATCA 10680 

TAATGATTCA AAG CAT AT AT TGCGGCATCT TGTAATGCAC CAAACATCCC AGGATTTGTG 10740 

TGCGTTTGGT ACTTTTTCAA AGCTTGAATC ATATCTTTAT TACCAACTGC AAAACCGACT 10800 
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CCATTTTCCG AAGCAAGTAT ACTAGGATTT TTAGCGTCGA AACCGAAAGC ACCATAAGCA 10 920 

AAATCATGCA CGATTTTAGT GTCTGTACCT TTAAATTTAG cTATCGCTTC ATCAAAAACT 10980 

TCTTTCGTAG CTGTCGATCC AGTTGGATTA TTTGGATACG TTAAATAAAT GAGTTTTGTT 11040 

TTATCTATTA TTTGTGAATC AACTTTGGAC CAATCTGGCA AATAATGTGG CGGTTCTAAA 11100 

TTAAGCGGGA CTGGCTTGCC AT CAG CT AAA AGTACACCTG CTAAATAATC CGTGTAGCCT 11160 

GGATCAGGTA GTAATACATA GTCTCCTGGA TTGATAACAC ATGTTGGTAC TGCCACTAAT 11220 

CCATTTTTTG TACCATATAA AATGCATACT TCATCTTCTT TATCTAACGT CACATTATAT 112 80 

TGTCTTTGAT AAAAATCTAC AATAGCTTGC TTGAACGCTT CTTTACCATG AAAAGCACCA 1134 0 

TATTTTTGAT TTTCAGGAAT AGTTAGTGCT TTTTGAAAAT GATCAATAAT ACCTTGTGGC 11400 

GTGGGCCCAT CAGGGATTCC AACTGCCATA TTAATTAATG GCAATGGTCC ATGTTCGATT 11460 

TTACGTCCCA TCGTTTTCCC GAAATAACTA TCAGGGATAT TTGCTAATTT GTTAGAGATC 11520 

ATCAAATTCC TCCTCTATCA TTAAACATAG CCTGGGCGAC TATCATAATC CTAACAACTT 11580 

GTATCACTCT CATTTAGATG GTTACAATGA CATCGCCATT CACCGTTATG TTCAACAGAA 11640 

CTTATGACAC ACGTTGTATT GAATGAATTT ATTTTCATTT TAGGTAGGTA TAATATTATT 11700 

GTCAATATTA GGAATTTTCA GATTAATATG CACTCAATCG TTATGATTTA ACTGTCATGC 11760 

ATATCCGCAT GCGCAACCAG TTAGATATGC TTATATAAAG TATAACGCCC ATCAAGGTAC 11820 

GT ATT CAAAC GTGAACCTTA ACAGGCGTCA TTCATTGTTA AATAAAACTT CTTAAGCACA 11880 

TACTTATTTC ACTATG CCTT TTACGTTCCC CTTATACTTT TCTCACATCT TTCTCTTAGA 11940 

CTACTCCCTT ATACGCCCCG CTCAATATCT TTAATCATTT CATCTACAGT TATTTTCGCA 12000 

CTCGTTAAGA CAATAGGAAC GCCTGCACCT GGATGCGTAC TTGCACCTGC AAAATATAAA 12060 

TCTTTATAAT CTCGCGATAC ATTTTGTGGA CGATAATAAT TACTTTGCGC TAAAGTTGGC 12120 

ATTAAACCGA ATGCCGAACC AAATTTCGCA TGATACGTTT GCTCAAAATC ATTTGG CGTA 12180 

AAGATTGTTT CTGAAACAAT ATGCGATTTT ATATCTTCAA ATACTTCAAT CGTTGCTAAT 12240 

TTACGATAAA TAATTTCCTT TATTTGTTGC GTCAAAGCTT CATCTGACCA ATCGATTCCG 12300 

CTACCTGTTT TAAGTTCCGG CGTCGGCATT AGCACATAAA TACCAGTTTT GCCTTCTGGC 12360 

GCAAGTGATT TATCAGCGAC CGCTGGTACA TACACATAAA TAGAAGGATC ATATGATAAA 12420 

CGTCCCTCAA ATATTTCTTC AATATTGCCT CTAAAGTCAT CTGAAAAAAT AACATTATGA 124 80 

AGTCTCACTT GATCTGTCAC ATCAATATCT ATACCGATAT ACATTAAAAA TGCTGAACAA 12540 

GAGTAATCTA AGTCTGCAAT TTTATGTGGT GGATACTTTT TAATAGGTGC AAAATCTGGC 12600 
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ATGTCACCAT TCACTTTTAT CGCATCGGCC CGTTTGAATT 

TCAATTTCAG CATTTAGTTC AATATTAACG CCTAAGTCTT 

TGAGCCATGC CATACATACC GCCTTTAATA AAATGCACAC 

GGAATAATTG AATATAGTGA CGGGCCTCGT TTTGGATCAA 

AACGCTAAAA GCTTTTGTAT CTTTTCGTTA TCAATATAAT 

TGATTTAACG TTTTTAACTT AGCACCTTGC ACAAGTGACG 

GGTTTGCGAT ACGTTCTTTC TAAGAAATAG CGACGTGCAA 

TCCGTTAAAA AGGACATAAA ACCATGCGTT GAACCAGGTT 

TGTAATTCAG CTAAATCTGT AGGCACCGTT ATACGATCAT 

TAAATATAAC GTAATTGTCT CAATTCAATA TAATCTTCAT 

AAAACATCTT TATAAACATC TGGCATCATG ACAATTGTGG 

CCGTCTTTCT TTAATTGATT CATACGCCCG CCTACATTAT 

ACTTCATGAC CTTGAGAAGC AATACGGGCT GCCGCTGCTA 

ATTACTGCAA TCTTCATTAT TCAACCACCT ATATTCTATG 

GAAACAACTT TGCCTTTTTC CTCTTATCCA CAAAAACACG 

CCTGTCTCAC TTCGTCCAGT ATTTCAATAT ATATACGTGC 

GTGCTTCAAT ACTAAATACT TTGATTTGAT CCATAACATC 

TAGCTGCATA ATATTCCCAT AAGTCAATAT AATGATTATT 

CAGCAATATC AACTTCATAT TGCTTTAATC GTTGCTTACT 

CAAAATCTTC AC CGACATCT CTTAATATAT TAAnGGGATC 

(2) 5 NFORMAT I ON FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 10088 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



TAGGATCAAT AATAATTTGC 12720 

TATTTAATTG CGCTAGCCCT 12780 

CAAACATCAT TTCAATCATA 12840 

TTCCTATGTA TAACGTTTGA 12 900 

GTTCAATTAG CTGATCTGCA 12960 

TCATATTATA AAAGTCACTC 13020 

TTTCATATTT TTTATAAACA 13080 

CTATACTTTC TAGCATTTGC 13140 

CGTGGTCAAA ATACACATCG 13200 

AATTTTTACC ACACGCTGTA 13260 

GACCCATATC AAATGTAAAG 13320 

TATTTTTTTC AAATATCGTC 13380 

ATCCTGTGAC ACCTGCACCA 13440 

ATATTTACTA TTTATTTCAT 13 500 

TTCATGTAAT GTATAGTTAG 13560 

TGCTAATTCT ATGATTGGTT 13620 

TTGAAAATCT TTTTCTGCGA 13680 

AACACCATTT TGGTACACTT 13740 

AAAATATATC CGTTCATTGT 13800 

CTCTAGAGTC GACCTG 13 856 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

so ATATATAAAT ATAGATTAAG TATATAGATT AATCAACTTT TTTGGAAGAG CAAATCACGC 60 

AATCAACAAA TAATATAAGA AGTTTTTGCG ATAGTTTTAA AATAGCTGTA ATAGAATACT 120 

AAATGTGACA AACTTAGAAC TAATATCAAG TGTTGATGTT TTGAATATAA AAATGCTAAT 180 
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ATAATTGGTT 


AATATATGAG 


TAATTAGAAA 


ATAGACAAAG 


GATGACGATT 


TATGTATATC 


300 




AATATGAAAG 


ATTATGGGTT AACAGGCATA AACAAAACTA AAGATACTCG AGCAATACAA 


360 


5 


CGTGCGTTAA 


ATCGTGGAAG 


ATGTAAACCA 


ACGACAGTTT 


ATATACCGAA 


AGGGACGTAT 


420 




GATATTTGCA 


AACCATTAAC 


GATATATGGC 


AATACAACAC 


TTTTGTTAGA 


TAATGAAACT 


480 


10 


ATTTTACGCC 


GATGTCATTC 


TGGTCCTTTA 


TTAAAAAATG 


GTCGTCGCTT 


TGGTTTTTaT 


540 




CGTGGTTATA 


ATGGACACAG 


TCATATTCAT 


ATTAAAGGCG 


GCAAGTTTGA 


TATGAATGGT 


600 




GTATCGTATC 


CTTATAACAA 


TACAG CTATG 


TGCATTGGGC 


ATGCTGAAGA 


TATTCAATTA 


660 


IS 


ATAGGTGTGA 


CCATTAAGAA 


TGTAGTGAGT 


GGTCATGCAA 


TTGATGCTTG 


TGGGATTAAC 


720 




GGACTCTATA 


TTAAAAGCTG 


TTCATTTGAA 


GGATTCATAG 


ACTATAGTGG 


CGAACcTTTT 


760 




ATTCTOAAGC 


AATACAATTA 


GACATTCAAG 


TACCTGGTGC 


TTTTCCAAAA TTCGSAACgA 


840 


20 


CAGATGGTAC 


GATAACGAAA 


AATGTCATTA 


TCGAAGATTG 


TTATTTTGGA 


CCTTCAGAAT 


900 




TGCCCGAAAT 


GGGAAGTTGG 


AATCGTGCTA 


TTGGCTCACA 


TGCAAGTAGA 


CATAATCGAT 


960 


25 


ACTATGAGAA 


TATTCATATT 


AGAAATAATA 


TATTTGAAGA 


TATACAAGGT 


TATGCATTAA 


1020 


CTCCCTTGaA 


GTATAAAGAT 


GCTTTCATTA 


TTAATAATAA 


GTTTATTAAC 


TGTGaGGGTG 


1080 




GCATTAGATA 


TTTAGGAGTT 


AGAGATGGTA 


AAAATGCAGC 


AGATGTGaTG 


ACAGGaAAAG 


1140 


30 


ACTTAGGTTC 


CCAAGCAGGC 


ATAAATATGA 


ATATAATTGG 


AAATGAATTT 


AAAGGATCAA 


1200 




TGTCTAAAGA 


TGCGATACAT 


GTACGTAATT 


ATAATAATGT 


TAAACATAAA 


GATGTATTAA 


1260 




TCGTTGGGAA 


TACATT CAAT 


AATTCGACTC 


AATCAATTCA 


TTTAGAAGAT ATTGATACAG 


1320 


35 


TGTTTTTAAG 


TCCTGTTGAA 


GCGGGTATTC 


AAGTTA CTAC 


AATCAATGTA 


GATGAAATAA 


1380 




AAAAGTAAAA 


AGTTTCGCAT 


GACATTAGGA 


TTAAGAATAG 


TAGATAATTT 


TTGAAAGCGC 


1440 




ATTGATAAAA 


CGGTATAAAT 


ATGCTATAAT 


AAACCCAATT 


ATCTGATAAA 


AGGGGTATTT 


1500 


40 


TGACGGTAAT 


GATAATACAA 


GATAGACAAC 


TTTCTATACT 


CTAATATAGT 


GAGTTGAAGT 


1560 




AGCTTGTCAT 


AATCATCATG 


AGGGGGAAAT 


TTATGGCTTA 


TTTCAATCAA 


CATCAATCAA 


1620 


45 


TGATATCGAA 


AAGGTATTTA 


ACATTCTTTT 


CAAAATCAAA 


GAAAAAGAAA 


CCGTTTAGTG 


1680 


CGGGACAACT 


TATTGGACTA 


ATATTAGGTC 


CATTACTTTT 


CCTATTAACA 


TTATTATTCT 


1740 




TTCATCCACA 


AGACTTACCT 


TGGAAAGGCG 


TCTATGTTTT 


AGCGATTACT 


TTATGGATTG 


1800 


SO 


CGACTTGGTG 


GATTACTGAA 


GCAATTCCTA 


TTGCAGCAAC 


GAGCTTATTA 


CCAATTGTGT 


1860 




TATTACCATT 


AGGTCATATA 


CTTACACCAG 


AACAAGTATC 


ATCCGAATAT 


GGCAATGATA 


1920 




TTATCTTTTT 


GTTTTTAGGT 


GGATTTATTT 


TGGCAATTGC 


AATGGAAAGA 


TGGAATTTAC 


1980 
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*T*TGfSATTf , A'T* 




ONM* X Iwl 1 ii A 


CTATGTTTGT 




W-AvjC J, (jTAA 


2100 




TGATTATGAT 




X X/*VjV_»>\/i 1 lit 




A<J>4 1 vjA 1 1 in 




2160 


6 


ATACGAATCA 


Barn tlczt^tt 






Avi 1 i X I A(j<JA 


A I IXiGCTATG 


2220 












G CCATTAATT 


ATTTTAAAAG 


2280 


10 


GACAATACAT 


GCAACATTTT 


(KiAvJA IXj AAA 


TTAGTTTTGC 


TAAATGGATG 


ATTGTAGGGA 


2340 


TTCCAACGGT 


CATTGTTTTG 


TT AGG T ATT A 


CTTGGCTCTA 


TTTAAGATAT 


GTTGCGTTTA 


2400 




GACATGATTT 


GAAATATTTa 


CCTGGTGGTC 


AGACGTTAAT 


TAAACAAAAG 


TTAGACGAGC 


2460 


IS 


TTGGCAAAAT 


GAAGTATGAA 


GAAAAGGTAG 


TACAAACTAT 


CTTTGTACTT 


GCTAGCTTAT 


2520 




TATGGATTAC AAGAGAOTTT 


CTTCTGAAAA 


AATGGGAAGT 


TACGTCATCT 


GTTGCAGATG 


2580 




GTACGATTQC 


TATTTTTATA 


TCAATATTAT 


TATTTATTAT 


TCCAGCTAAA 


AATACTGAAA 


2640 


20 


AACATCGCCG 


TATCATTGAC 


TGGGAAGTTG 


CAAAAGAGCT 


CCCTTGGGGT 


GTATTAATTT 


2700 




TATTTGGTGG 


CGGTTTAGCA 


TTAGCGAAAG 


GTATTT CTG A 


AAGTGGTTTA 


GCAAAATGGT 


2760 




TAGGCGAACA 


GTTGAAATCA 


TTAAATGGTG 


TTAGTC CGAT 


TCTTATTGTA 


ATTGTCATAA 


2820 




CAATCTTTGT 


CTTATTTTTA 


ACTGAAGTGA 


CATCTAATAC 


TGCAACTGCA 


ACGATGATTT 


2880 




TACCGATTTT 


AGCAACGTTG 


TCTGTTGCTG 


TTGGAGTGCA 


TCCATTACTA 


CTTATGGCAC 


2940 


30 


CTGCAGCTAT 


GGCGG CTAAC 


TGTGCATACA 


TGTTACCAGT 


AGGGACACCA 


CCGAATGCAA 


3000 


TTATCTTTGG 


TTCTGGTAAA 


ATATCTATCA 


AACAAATGGC 


ATCAGTAGGA 


TTCTGGGTAA 


3060 




ACTTAATCAG 


TGCAATAATT 


ATTATTTTAG 


TCGTGTATTA 


TGTAATGCCT 


ATAGTTTTAG 


3120 


35 


GTATTGATAT 


AAATCAACCA 


CTGCCATTGA 


AATAGTAATT 


GCAGATTAGA 


ACGAAAAATA 


3180 




AAAGGTTACA 


TTAGCAATTG 


CTTGGACGAG 


TGGTAACGAA 


ACGTATACCG 


CAGCATCGTG 


3240 




TAASAACAAT 


ACAAACAAAA 


GAAAGTCAAC 


CAAGGATGGA 


TTCCTATTTT 


AATCCTTGGT 


3300 


40 


TGACTCTTTA TTTTATTTAA ATTGTAGAAC 


CTAGAAAATA 


AAGTTTAATT 


AAAAGCACCA 


3360 




ATCATTTCTA 


CTTTGAAATC 


TAAGGTTTCT 


AAAATAGCAA 


TGACTTTCTT 


TATATCGGTT 


3420 


45 


GTAATTGCAG 


AATCAGCCTG 


AACGAAAAAT 


CGATACATAC 


CTAATTGTGT 


TTTTAAAGGA 


3480 


CGAGACTCAA 


TCCAGGATAA 


ATTAATATTA 


AACAAAGCAA 


ATGTATTAAG 


CACACTTGCT 


3540 




AACAACCCAG 


GTTTATCATG 


CATTGGTGTA 


ATTAAAAACA 


TCAATGATGT 


CGCATTTTGA 


3600 


50 


TCAAATTGCT 


GCTGATTTTT 


TATAACTAAA 


AAACGTGTCA 


CGTTATGTGG 


ATAGTCTTCA 


3660 




ATATGTGTAT 


CAATAGGTGT 


AAAACCATAA 


GctTCGCCAC 


TACCTAAAGG 


TGCAATTGCT 


3720 




GCAACGCCAT 


TTTCAATTTT 


AGTCAAACTT 


TGAATTGTAC 


TGTCGACATA 


ATCATAGTCA 


3780 
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TTTTTAATAT CAGAAATGGA ATCTGTTCCA TTACCATATA ATGCAAAQTT AATATCTAAA 3900 

CGTATTTCAC CGTGTGCAAA GACATCTTGC TGTGCAAGTG CATCTGCCAC AATGTTGATT 3960 

5 

GTTCCTTCTA TAGAATTTTC AATAGGGACA ACACCAATCG ATGTGTCATC ATCTGCAACT 4020 

GCCTTGATGA CTTCAAATAA ATTTGACTTT GGTTGAAAAG TTGCTTCATT TTCAGAAAAA 4080 

TACTGACGAC AAGCCAAATA TGAAAATGTA CCTTTAGGGC CTAAATAATA TAATTGCATA 414 0 

10 

TGCTACACCT CTACTAACTT AATGATGGAA AGGGCACTGG TTAGCATTTG ATTCTTTCTT 4200 

TTTATAGAAA AAGTTTGGAT CTTTTACTGT ATTGTCATAT CCGTGATGAT AATTTGACGT 4260 

1S CAATGTTGGA GATAATGGCG GTGCTAGCCA AGACCATTTT CCGGTAACTT GACGACCTTG 4320 

TTGTGCTTCG TTACGTTCGA ATAGTTCGAA TTGCTTTGCA GCGGTCAAAT GATCGACAAT 4380 

TGATACGCCT TCTTTTTTAA AGGAATGATA CACAGCATAG TTCAATTCAA CAAGTGCTCG 4440 

20 ATCTTTATTA AATGAATTAT TTTTAAGTGT ATCAAATTCA AACGCATCTG CAACTTTTTC 4500 

TAGTAAATTG TAACGGTAAT CATCAATAAA GTTACGTACG CCAATTTCAG TTACCATATA 4560 

CCAACCGTTA AAGGGTGCAG TTGGATATAC AATGCCACCG ATTTTTAAGT CCATATtGGA 4620 

25 

AATGATAGGG ACTGCATACC ATTTTAAGTT CAATTTTCTT AATTTTGGAT AATGATTATG 46 80 

TTCAATAGGT ACTTCTTTAA TTAATGAAGT AGGATATTCG TAAAATTTAA CTGACTCATT 4740 

AGGTAATTGG TAAATCAGTG GTAACACGTC AAAATTAGTA CCTTTTCCTT TCCAACCTAA 4800 

30 

GTGATTTGCT AAGCGTGTAA CTTCTTTTTC AGCAGGATCA CCACAATTGT CATAGCCAGC 4860 

ATAGCGAATT AATTGATTGT TGAAAATTTT AGGTCCATCC TTTGGAGCAT ATATAGTAAT 4920 

35 ATACGGCTTT AATTTACCTT CATTTGTAGC CTGTGTAATA TGATAAGTAA TTGATGATAA 4980 

GAACGATGCT TCGTCAGTAA CATCTCTTGC ATCAATGACA TTTAACGAAT CCCAAAATAA 5040 

ACGACCAATG CAACGATTTG AATTACGCCA AGCCATTTTA GCACCATAAA TAAGTTCTTC 5100 

40 ' 

TTCTGTATGT GTATATGTCC CAGTTTCTTT TATTTCTAGT TCAATGTCAT GTAAACGTTT 5160 

ATTGATAATT TGCGTTTCAT AATGACACTC TTTATACATG TTTTCTATGA AAGCTTGAGC 5220 

CTCTTTAAAT AACATTAACA ACACCTCGCT TTATATTATA GTCTACATTA TTAAAATACT 5280 

45 

CTTAAAAATT ATGTATATGT CATTAAATTG TTGGTTGATT TTAATTAAAA GTATGGAAAT 5340 

TAAGGGGCTC TTATGTATAT AAAAAAATGA ATTATGATAA AATGTAAGAA AATATTTAGG 5400 

so TCGATTGGAG AGATACAAGT GTACCAATTA GAAGACGACA GTTTAATGTT ACATAATGAC 5460 

TTATATCAAA TAAATATGGC TGAAAGTTAT TGGAATGATA ATATTCATGA AAAAATGGCT 5520 

GTATTTGATT TGTATTTTAG AAAAATGCCA TTTAATAGTG GCTATGCTGT TTTTAATGGT 5580 
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10 



1$ 



so 



40 



45 



50 



55 



TTAAAGTCTA TTGGCTACAA GGATGATTTC TTATCATATT TAAAAGATTT AAAATTCACA 5700 

GGCAGCATCC GTTCGATGCA AGAAGGCGAA TTATGCTTTG GTAACGAACC ATTGTTACGC 5760 

GTAGAAGCAC CATTGATTCA AGCGCAATTA ATAGAAACAA TTTTATTAAA CATTQTAAAT 582 0 

TTCCATACAT TAATTACAAC AAAGGCTAGC AGAATTCGTC AAATTGCATC AAATOATAAA 588 0 

TTAATGGAGT TTGGTACACG TCGTQCGCAA GAAATTGATG CAGCATTGTG GGGCGCTAGA 594 0 

GCTGCTTACA TCGGGGGCTT TGATTCTACA AGTAATGTTA GGGCGGGGAA ATTATTTGGT 6000 

ATACCTGTGT CTGGTACACA TGCACATGCA TTTGTCCAAA CTTATGGAGA CGAATATGTT 6060 

GCCTTCAAAA AATATGCTGA AAGACATAAA AATTGTGTGT TCCTAGTAGA TACATTCCAT 6120 

ACTTTAAAAT CTGGCGTGCC AAATGCAATA AAAGTTGCAA AAGAATTAGG TGACAAAATT 6180 

AACTTTGTAG GTATTCGATT AGATTCTGGA GATATCGCTT ATTT AT CT AA AGAGGCAAGA €240 

CGTATGCTTG ATGAAGCAGG ATTTACTGAA ACTAAAATTA TCGCGTCTAA TGATTTGGAT 6300 

GAAGAAACGA TTACGAGTTT GAAAGCACAA GGTGCAAAAG TAGATTCTTG GGGCGTTGGT 6360 

ACAAAGCTGA TTACAGGATA CGATCAACCA GCATTAGGTG CAGTATATAA ACTTGTAGCT 6420 

ATTGAAAATG AAGATGGTTC ATATAGTGAT CGTATTAAAT TATCAAATAA CGCTGAAAAG 6480 

GTTACGACGC CAGGTAAGAA AAATGTATAT CGCATTATAA ACAAGAAAAC AGGTAAGGCA 6540 

GAAGGCGATT ATATTACTTT GGAAAATGAA AATCCATACG ATGAACAACC TTTAAAATTA 6600 

TTCCATCCAG TGCATACTTA TAAAATGAAA TTTATAAAAT CTTTCGAAGC CATTGATTTG 6660 

CATCATAATA TTTATGAAAA TGGTAAATTA GTATATCAAA TGCCAACAGA AGATGAATCA 672 0 

CGTGAATATT TAGCACTAGG ATTACAATCT ATTTGGGATG AAAATAAGCG TTTCCTGAAT 67 BO 

CCACAAGAAT ATCCAGTCGA TTTAAGCAAG GCATGTTGGG ATAATAAACA TAAACGTATT 684 0 

TTTGAAGTTG CGGAACACGT TAAGGAGATG GAAGAAGATA ATGAGTAAAT TACAAGACGT 6900 

TATTGTACAA GAAATGAAAG TGAAAAAGCG TATCGATAGT G CTGAAG AAA TTATGGAATT 6960 

AAAGCAATTT ATAAAAAATT ATGTACAATC ACATTCATTT ATAAAATCTT TAGTGTTAGG 7020 

TATTTCAGGA GGACAGGATT CTACATTAGT TGGAAAACTA GTACAAATGT CTGTTAACGA 7080 

ATTACGTGAA GAAGGCATTG ATTGTACGTT TATTGCAGTT AAATTACCTT ATGGAGTTCA 714 0 

AAAAGATGCT GATGAAGTTG AGCAAGCTTT GCGATTCATT GAACCAGATG AAATAGTAAC 7200 

AGTCAATATT AAGCCTGCAG TTGATCAAAG TGTGCAATCA TTAAAAGAAG CCGGTATTGT 7260 

TCTTACAGAT TTCCAAAAAG GAAATGAAAA AGCGCGTGAA CGTATGAAAG TACAATTTTC 7320 

AATTGCTTCA AACCGACAAG GTATTGTAGT AGGAACAGAT CATTCAGCTG AAAATATAAC 73 80 
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TAAACGACAA GQTOGTCAAT TATTAGCGTA TCTTGGTGCG CCAAAGGAAT TATATGAAAA 7500 

AACGCCAACT GCTGATTTAG AAGATGATAA ACCACAGCTT CCAGATGAAG ATGCATTAGG 7560 

5 

TGTAACTTAT GAGOCGATTG ATAATTATTT AGAAGGTAAG CCAGTTAC6C CAGAAGAACA 7620 

AAAAGTAATT GAAAATCATT ATATACGAAA TGCACACAAA CGTGAACTTG CATATACAAG 7680 

ATACACGTGG CCAAAATCCT AATTTAATTT TTTCTTCTAA CGTGTGACTT AAATTAAATA 7740 

10 

TGAGTTAGAA TTAATAACAT TAAACCACAT TCAGCTAGAC TACTTCAGTG TATAAATTGA 7800 

AAGTGTATGA ACTAAAGTAA GTATGTTCAT TTGAGAATAA ATTTTTATTT ATGACAAATT 7860 

15 CGCTATTTAT TTATGAGAGT TTTCGTACTA TATTATATTA ATATGCATTC ATT AAGG TT A 7920 

GGTTGAAGCA GTTTGGTATT TAAAGTGTAA TTGAAAGAGA GTGGGGCGCC TTATGTCATT 7980 

CGTAACAGAA AATCCATGGT TAATGGTACT AACTATATTT ATCATTAACG TTTGTTATGT 8040 

20 AACGTTTTTA ACGATGCGAA CAATTTTAAC GTTGAAAGGT TATCGTTATA TTGCTGCATC 8100 

AGTTAGTTTT TTAGAAGTAT TAGTTTATAT CGTTGGTTTA GGTTTGGTTA TGTCTAATTT 8160 

AGACCATATT CAAAATATTA TTGCCTACGC ATTTGGTTTT TCAATAGGTA TCATTGTTGG 8220 

25 

TATGAAAATA GAAGAAAAAC TGGCATTAGG TTATACAGTT GTAAATGTAA CTTCAGCAGA 8280 

ATATGAGTTA GATTTACCGA ATGAACTTCG AAATTTAGGA TATGGCGTTA CGCACTATGC 8340 

TGCGTTTGGT AGAGATGGTA GTCGTATGGT GATGCAAATT TTAACACCAA GAAAATATGA 8400 

30 

ACGTAAATTG ATGGATACGA TAAAAAATTT AGATCCGAAA GCATTTATCA TTGCGTATGA 8460 

ACCTCGAAAC ATACATGGTG GATTCTGGAC TAAAGGCATT CGTCGTAGAA AGCTTAAAGA 8520 

35 TTATGAACCA GAAGAACTGG AAaGTGTAGT AGAaCATGAA aTTCmAAGTA AaTGAGAaTG B580 

AAmCAATtGC TGATTGTTTG TCACGAATGA AAtGCAAGGG TATATGCCGG TAAAACGTAT 8640 

TGAAAAACCC GTGTTTCAAG AGCAAAAAGA TGGCACGGTT GAAGTATCAC ATCAAGAAAT 8700 

40 CGTtTTTGTA GGTAAGAAAA TCCAATAACA TAATCCAATT TAAATAAAGA CTATTTGAAG 8760 

AGGAAAGGCT ATTCAAAGTT TGAGTAATTT TACTTTGAAT AGCCTATTTG TTTATACATG 882 0 

CAAGATG CTC GATCCATATT GTATGAGAAA CCCCCAGCAA GCTATATAAA GCATATGCTG 8880 

45 

GGGGTTCTTA ATATTTTAAA AATTATTGTT AGATTATATA TATCGTCGCT TTTTCTAAAA 8940 

CAATCTCATC GCATGAAATT TTTTCTTCCT AGAGACCTTT AATAAGATTA ATAGTTTACT 9000 

5Q TAATCAT AT C TAGATAGTCT TATGACTTAT GCTTAATGAA AGTCATTCTA GGAGAAGTTC 9060 

CCAAAGCTTC TGTGTTCATA . ATTGTTAGTA GTATTTTATT ATCATTTGGT ATAAATATTT 9120 

CAATAACAAT TGAGCTATTA TTTTTATTAT ATAATGTGAG TTGTTTGTGT TCTGTATTTA 9180 
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20 



25 



30 



35 



CATTTAAATC TTGAGGATGC CATTCTCCCT CAATAATATT AAQATAATAC TTAGCCTCTG 
AATTACATTT GAATTTATCA ATACTAAATA ATTCAATTTG TTCCATAATA TTATTTACCT 
TTCTAAAATA CAAATTTTAA TAAC CAT AAA TAGATGAATA CCATCGATAA TGGTCGCCAT 
TGGATACTGG AATAACATTG TTTTTAGCAT CTTGAGTCAT AAAACCATTA TCCCATGGAT 
TCCATATAAT TATAACCTCT TGTCCATTAT CTAATTTAGC GTTCCCAACA ACTGCCATGG 
CATGCCCTGC GTGCATACCA TTTCTTGATT CTACTCTACT ACCTAAAACA GCAATTCCTT 
TATTATTTTT AGTAAGATTG TCAACTTCAT TATATGTAGT CATTCTATTA AGAAGTTGTG 
GACTTCTTCC CTGAGTTTGT CCAAAATAAA TCATCTCTCT TGGCGTTAAA CCAGTAAATT 
GGAATCGTTG TCCTTGTAAG TTTGGGTGTA AAAATCTCAT CACAGCTTCT GCATGATATT 
TGTTAGTATT ATAAGTCGCA TTTAGTAATT CAGACATCGT ATAGCCTGCA CACCAACCAT 
TGTTACCTTG AGTTTCTCTT ATCTTGAAAT TCTCAAGTTT ATTTATATAT TGsTCGTTGT 
AAGTATAATT ATTACTTTTA AATTGACTAG TTGGCATAGT GACAGAAGCT TTTTGCTTTA 
GTTGCGTTAC ATTATTGCCA GTAGGTATAC TCTCAGTCTT TnTnAACTnT nTATCTT CTA 
GACGTGGTGT TTTTAGTACT AGTTTAGCTT TATGATTTTG AGTACCACAT AGTAACCTTT 
TGAGTTGT 

<2) INFORMATION FOR SEQ ID NO; 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7563 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10088 



40 



45 



60 



r (Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

CGGAAACGnA CCCnATGCGT ATGCTTGACG TGCCAAAATT AAATACGAAG TTCATAGCTT 60 

TGAGGTACCA GAAGAACATT TATCTGGTCA AGAAGTCGCA GnACTCATAC AAGCAAATGT 120 

TAAAACAGTA TTTAAAACGC TTGTTCTAGA AAATACAAAA CATGAACATT TTGTATTTGT 180 

TATCCCAGTA AGTGAAACTT TAGATATGAA AAAGGCAGCT GCTTTGGTTG GAGAGAAGAA 24 0 

ATTGCAGCTT ATGCCTTTAG ATAATTTGAA AAATGTAACG GGATACATTC GTGGTGGGTG 300 

TTCGCCTGTT GGTATGAAAA CATTGTTTCC AACAGTCGTT GACAAATCGT GTGAAAATTA 360 

TAGTCATATC AGTGTGAGTG GTGGG CTTCG AACAATGCAA ATCACAATAG CPGTTGAGGA 420 

TTTGATTACA ATAACTAAAG GCAAAATTGG AGCAGTTATC CATGAATGAT TAATAACAAC 480 
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TGCCACACTC CTTTTTGATT GAATTAGCAT TTTACGATCA TAAACAGTCA TTATAATTGA 600 

GTATTTGAAC ATAAAAATGT AATTTTATCG TAACAATTTG AGTGTTTGTG ATTGTTTTTG 660 

5 GTAATTTATG ATTGAAAAGT GAAAGCGTAC TCATTATAAT ACAAAGTGAG ATGGGGTGAT 720 

GATGATAATT ACTGaAAAAA GACACGAGTT AATATTAGAA GAACTTTCGC ACAAAGATTT 780 

TTTGACTTTA CAAGAATTAA TAGATCGAAC TGGTTGCAGT GCTTCAACAA TACGArGAGA 84 0 

10 

TTTATCTAAA CTACAACAAT TAGGGAAATT GCAACGTGTG CATGGTGGTG CAATGTTAAA 900 

AGAAAATCGT ATGGTTGAGG CGAATTTAAC TGAAAAATTA GCAAOGAATC TTGATGAAAA 960 

/5 GAAAATGATT GCTAAAATAG CAGCTAATCA AATCAACGAT AATGAATGCT TATTTATCGA 1020 

TGCTGOTTCA TCTACATTGG AGCTAATTAA ATATATTCAA GCGAAAGATA TCATTGTGGT 1080 

AACCAATGGT TTAACACATG TAGAAGCTTT ACTTAAAAAA GGTATTAAAA CAATTATGCT 1140 

20 AGGTGGTCAA GTTAAAGAAA ATACACTTGC TACGATTGGT TCTAGTGCTA TGGAGATATT 1200 

AAGACGATAT TGTTTCGATA AAGCTTTTAT CGGGATGAAT GGATTAGATA TTGAACTTGG 1260 

ATTAACTACT CCCGATGAGC AAGAGGCATT AGTTAAACAA ACAGCAATGT CATTAGCCAA 1320 

25 TCAATCATTT GTACTTATAG ATCATTCTAA GTTTAATAAA GTATATTTTG CTCGTGTACC 1380 

TTTGCTAGAA AGTACGACAA TCATCACATC TGAAAAAGCA TTAAATCAAG AATCGTTAAA 1440 

AGAATACCAA CAAAAGTATC ACTTTATAGG AGGGACTTTA TGATTTATAC AGTGACTTTC 1500 

30 

AATCCTTCAA TTGACTATGT CATTTTTACG AATGATTTTA AAATTGATGG TTTGAACAGA 1560 

GCAACAGCAA CATATAAATT CGCTGGGGGG AAAGGTATTA ATGTCTCGCG CGTCTTAAAG 1620 

3S ACATTGGATG TTGAGTCAAC TGCCTTGGGA TTTGCAGGTG GATTTCCTGG GAAATTCATT 168 0 

ATAGATACAT TAAATAACAG TGCAATTCAA TCGAATTTTA TTGAAGTTGA TGAAGATACA 174 0 

CGTATTAATG TGAAATTAAA AACAGGACAA GAAACAGAAA TCAATGCACC GGGTCCTCAT 1800 

40 ATAACGTCAA CACAATTTGA ACAACTGTTA CAACAAATTA AAAATACAAC AAGCGAAGAT 1860 

ATAGTTATTG TTGCTGGAAG TGTACCAAGT AGTATTCCAA GCGATGCGTA TGCGCAAATT 1920 

GCACAAATTA CAGCACAGAC AGGTGCTAAA TTAGTAGTCG ACGCTGAAAA AGAATTGGCT 1980 

45 GAAAgCGTTT TACCATATCA TCCACTATTT ATTAAACCTA ATAAAGATGA ATTAGAAGTG 2 040 

ATGTTTAATA CAACAGTGAA CTCAGACACA GATGTTATTA AATATGGTCG TTTGTTAGTT 2100 

GATAAAGGTG CGCAATCTGT TATTGTCTCG CTTGQCGGTG ATGGTGCTAT TTATATTGAT 2160 

SO 

AAAGAAATCA GTATTAAAGC AGTTAATCCA CAAGGGAAAG TGGTTAATAC AGTTGGCTCT 2220 

GGTGATAGTA CAGTTGCAGG CATGGTGGCT GGAATTGCTT CAGGTTTAAC GATTGAAAAA 2280 
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CGGGACGCTA 


TAGAAAAAAT 


AAAATCACAA 


GTTACGATTA 


GCGTACTTGA 


TGGGGAGTGA 


2400 




AAATAATGAG 


AGTAACAGAG 


TTATTAACAA 


AAGATACAAT 


AGCAATGGAT 


TTAATGGCAA 


2460 


5 


ATGACAAAAA 


TGGTGTTATT 


GATGAGTTAG 


TAAATCAATT 


AGACAAAGCA 


GGTAAATTAA 


2S20 




GTGATGTCGC 


GTCATTTAAG 


GAAGCQATTC 


ACAATCGAGA 


ATCACAAAGT 


ACAACTGGTA 


2S80 


10 


TCGGCGAAGG 


TATTGCCATT 


CCACATGCCA 


AAGTGGCCGC 


AGTTAAGTCA 


CCAGCTATTG 


2640 


CGTTTGGTAA 


ATCTAAAGCA 


GGCGTAGATT 


ATCAAAGTTT 


GGATATGCAA 


CCAGCACACT 


2700 




TATTCTTTAT 


GATTGcAGcG 


CCAGAAGGTG 


GCGCCCAAAC 


ACATCTAGAT 


GCTTTAGCTA 


2760 


15 


AGTTGTCTGG 


TATTTTAATG 


GATGAAAATG 


TACGTGAGAA 


ATTATTACAT 


GCTTCATCAC 


2820 




CTGAAGAAGT 


ACTAGCGATC 


ATAGATGAGG 


CTGATGATGA 


AGTGACAAAA 


GAAGAAGAGG 


2880 




CAGAAGCTGA 


AGCACAACAA 


GTTGCAACTG 


CAGAACAATC 


ATCTAAACAA 


TCTAATGAGC 


2940 


20 


CATATGTGTT 


AGCAGTAACT 


GCTTGTCCAA 


CAGGTATTGC 


ACACACATAT 


ATGGCACGTG 


3000 




ATGCATTGAA 


AAAGCAAGCG 


GATAAAATGG 


GTATTAAAAT 


TAAAGTAGAA 


ACGAATGGTT 


3060 




CAAGCGGCAT 


TAAAAACCAT 


TTAACTGAAC 


AAGAT ATTGA 


AAATGCAACA 


GGTATCATTG 


3120 


25 


TTGCTGCTGA 


TGTTCATGTT 


GAGACGGATC 


GCTTCGATGG 


TAAAAATGTC 


GTAGAAGTAC 


3180 




CAGT AG CAG A 


TGGTATTAAA 


CGCCCAGAAG 


AATTAATTAA 


TAAAGGATTA 


GATACAAGTC 


3240 


30 


GTAAACCTTT 


TGTTGCCCGT 


GATGGTCAAA 


GAAAAGGTAA 


CTCAAATGAC 


AGTCAAGAAA 


3300 


AATTAAGCCC 


AGGTAAAGCA 


TTCTATAAAC 


ACTTAATGAA 


CGGTGTTT CT 


AACATGTTGC 


3360 




CACTTGTAAT 


ATCTGGTGGT 


ATTTTAATGG 


CAATTGTATT 


TTTATTTGGA 


GCAAATTCAT 


3420 


35 


TTAATCCAAA 


AAGCTCAGAG 


TACAATGCGT 


TTGCAGAGCA 


GCTTTGGAAC 


ATTGGTAGTA 


3480 




AAAGTGCATT 


CGCGTTAATC 


ATTCCAATTT 


TATCTGGATT 


CATTGCACGT 


AGTATTGCGG 


3540 




ATAAACCTGG 


TTTCGCTTCA 


GGTCTTGTAG 


GTGGTATGTT 


AGCAATTTCA 


GGTGGTTCAG 


3600 


40 


GATTTATTGG 


TGGTATTATT 


GGAGGTTTCT 


TAG CAGGTTA 


CTTAACACAA 


GGTGTTAAAG 


3660 




CCATGACACG 


TAAGTTACCA 


CAAGCATTAG 


AGGGATTAAA 


GCCAACATTA 


ATTTATCCAC 


3720 




TATTAACAGT 


GACGGCTACA 


GGCTTATTGA 


TGATTTATGC 


CTTTAATCCA 


CCAGCATCTT 


3780 


45 


%3V3 J. IHrtnl V~/\ 


1 X lul 1 >\ 1 In 


n J\. TVin ly TT A A 
VXf\ J. VJ^j/i i. 1 t\J\ 




Aw 11L1 AH 1 


ATTGTATTAT 


384 0 




TAGGTTTAGT 


TATTGGCGCT 


ATGATGGCGA 


TTGATATGGG 


CGGTCCATTC 


AACAAAGCGG 


3900 


50 


CATATGTTTT 


TGCAACAGGT 


GCGTTGATTG 


AAGGTAATGC 


AGCACCAATT 


ACAGCTGCAA 


3960 


TGATTGGTGG 


TATGATTCCA 


CCGTTAGCAA 


TTGCGACAGC 


GATGTTAATT 


TTTAGACGTA 


4020 




AATTTACAAA 


AGAACAACGT 


GGTTCAATTA 


TCCCTAACTA 


TGTGATGGGT 


ATGTCATTTA 


4080 
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TGATTGGTTC AGGTATAGGT GGCGCAATTG 

CACATGGTGG TATTATTGTA ATTGTTGGTA 

5 

TTGCACTTCT AGTTGGCACA TTAGTTTCAG 

TAACTGAAAC AGAAATCGAA GCTTCAAAAT 

TGATTGTTAG CAAAGAGCTT CATATTAAGT 

10 

TATATCGTGT TAACGGTAGC TTATACAAAG 
TTATGAATTG ATATGAAAGT GTTTTTATTT 

1S CAAATGTATA GACTTTTTTA ATATTTTGCA 

AAAATATGAG TGTCTTAAAG TGAAAATTTA 
TTAATTATAT ATAACGGCAA AGTTTATACT 

20 CATGTGAAAG ATGGACAGAT TGTTGCAATT 

AATGATACGA CAAATAAAAT TCAAGTGATT 
TTTATTGATA TACATATTCA TGGTGGTTAT 

25 GGCTTAAAAT ATCTATCCGA AAATTTGTTG 

ACAATGACGC AATCGACTGA TAAAATAGAT 
GCGGAgCAAG ATGTTCACAA TGCAGCGGAA 

30 

ATATCTGAAA ATAAAGTTGG TGCTCAACAT 
AAAATTAAAC ATTTTCAAGA GACTGCTAAC 

35 GAAATTGAAG GTGCAAAAGA AGCGCTTGAA 

GGTCATACAG TAGCAACATA CGAAGAAGCA 
GTCACGCATT TATATAATGC AG CGACGCCA 

40 GCAGCATGGT TGAATGATGC TCTACATACC 

CCGGCATCGG TTGCAATTGC TTACCGTATG 
GATGCAATGC GTGCAAAAGG TATGCCTGAA 

45 ACTGTTCAAT CGCAACAAGC ACGTCTTGCA 

ATGAATCATG GGTTACGTAA CTTAATATCA 
CGAGTAACAA GTTTAAATCA AGCCATTGCA 

SO 

AAAGTAAATA AGGATGCAGA TCTTGTTATT 
ATAAAACAAG GCAAGGTTCA CACATTT AG C 

55 



V* X X litWVJUl X 


AVXw X X WtwVIn 


ATT AfTfi fY5 




CTGATGGTRP 


Zi f* A fTT A f*TT 


wnnn v« X v* X X /\ 


42o0 


A A «U» A A Xr% 


rGGTTTAATC 

AAA #W» A >p 


AAACCAAAGT 




CAATGGAGGA 


fSTAOTTTTAA 
\j x nvj a x x iaa 


TO ATfJT A AAA 
X \Jrt X \9 innnn 


ii <] a a 


TGTATGTTCA 


ATGAATATAT 


\3 X X X X/\ 






nL lllLlnl x 


AA1 X W\VJ x X X 


4500 


TTA/^IlTllftftT 
1 xAViMX A/Uil 


c ft a. tv* ft fto ft a 


*m» a /~»A A 


4560 


AAAAul 1AI\> 


/V"» A A A A ^» 

CZCAAACCxAAG 


CAGATATAGT 


4620 


TAAATAAAGA 


AGGGTTTATA 


CGTGTCAGAA 


4680 


GAAGATGGCA 


AAATCGATAA 


TGGTTACATT 


4740 


GGAGAAGTGG 


ATGATAAAGC 


AGCAATTGAT 


4800 


GATGCTAAAG 


GTCATCATGT 


ATTACCAGGT 


4860 


GGTCAAGATG 


CAATGGATGG 


GTCATACGAT 


4920 


TCTGAAGGGA 


CGACATCATA 


CTTGGCCACT 


4980 


AATGCACTTA 


CAAATATTGC 


TAAATATGAA 


5040 


ATTGTAGGTA 


TACATTTAGA 


AGGACCATTT 


5100 


CCG CAATACG 


TTGTACGCCC 


ATTTATCGAT 


5160 


GGATTAATAA 


AGATTATGAC 


GTTTGCACCT 


5220 


ACGTATAAAG 


ATGACATTAT 


TTTTTCAATT 


5280 


GTTGAAGCTG 


TTGAGCGAGG 


AGCTAAACAT 


5340 


rrvTi/^j^i j\ A A T" A 

1 X L-CAALLA1 A 


CiACxAACCAGG 


TGTTTTTGGA 


5400 


fill & ft TYlft 1*1^2 


X ±\»A x\j\jL-Al_. 


TCATTCTCAT 


54 6 0 


nnnvjy x An 1 V» 


AnlvV 1 X x 1 X>v 


X X XAAX lALt, 


5520 


fV3Jlf5 2V ATATY2 


#\X X X vjrVjVj X VJVJ 


ft Oft 21 AH UPTi 


c. c a r\ 


AATGGTGCGC 


TTGCTGGTAG 


TATTTTAAAA 


5640 


TTTACAGGTG 


ATACATTAGA 


TCATTTATGG 


5700 


TTAGGTATCG 


ATGATAGAAA 


AGGTAGTATT 


5760 


CTAGATGATG 


ATATGAATGT 


AAAATCTACA 


5820 


TAATAAATAA 


TCATAATTAA 


ATGTATGCAA 


5880 
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TTTTCTGGGG GTGTCTAAAT GGGAAGGCGA TAACATGTAG TTGTAATTTA AGTCATAGTG 6000 

ATAAATTTGA ATGCGTGTTA CCCATGAGTG ACACATATAA CATGGAGGTG AATCCCTAGA 6060 

5 AATAGGGAAT TAATTGGAAA CTTCGACCAT AATTAGTTTG ATTATATTTA TTCTATTAAT 6120 

TGCATTAACC ACTGTATTTG TTGGTTCAGA ATTTGCATTA GTAAAAATTA GAGCAACAAG 6180 

AATTGAACAG CTAGCAGATG AAGGAAATAA ACCTGCTAAA ATAGTAAAAA AGATGATTGC 6240 

10 

TAATCTAGAT TATTATCTTT CTGCTTGTCA GTTAGGTATA ACAGTAACAT CTTTAGGGTT 63 00 

AGGTTGGCTT GGTGAACCAA CGTTTGAAAA GCTATTACAC CCAATATTTG AAGCAATCAA 6360 

1S TTTACCAACT GCATTAACGA CGACGATTTC GTTTGCAGTG TCATTTATAA TCGTTACGTA 6420 

TTTGCATGTA GTACTTGGTG AATTAGCGCC TAAATCTATA GCTATTCAAC ATACTGAAAA 6480 

GCTTGCTTTA GTATATGCAA GACCATTGTT CTATTTCGGT AACATTATGA AACCATTGAT 6540 

20 TTGGCTGATG AATGGTTCTG CACGTGTTAT TATTAGAATG TTTGGTGTAA ATCCTGATGC 6600 

CCAAACTGAT GCAATGTCAG AAGAAGAAAT CAAAATTATT ATTAACAATA GTTATAATGG 6660 

TGGAGAAATC AACCAAACTG AATTGGCATA TATGCAAAAT ATCTTTTCAT TCGATGAAAG 6720 

2S 

ACATGCAAAA GATATAATGG TACCTAGAAC TCAAATGATT ACACTAAATG AACCTTTTAA 6780 

TGTAGACGAA TTACTAGAAA CAATAAAAGA ACATCAATTT ACGCGTTATC CAATTACTGA 6840 

TGATGGTGAT AAAGACCACA TTAAAGGATT TATTAACGTC AAAGAATTTT TAACTGAATA 6 900 

30 

CGCTTCTGGA AAAACGATTA AAATAGCAAA CTATATa CAT GAGTTGCCAA TGATTTCAGA 6960 

GACAACACGT ATCAGTGATG CATTAATTAG AATGCAACGT GAACATGTAC ATATGAGTCT 7020 

35 TATTATAGAT GAATATGGTG GAACGGCAGG TATTTTAACG ATGGAAGATA TTTTAGAAGA 7080 

AATCGTTGGA GAAATTCGTG ATGAATTTGA TGATGATGAA GTGAATGATA TCGTTAAAAT 714 0 

TGATSATAAG ACATTCCAAG TAAATGGCAG AGTACTATTG GATGATTTAA CTGAAGAGTT 7200 

40 CGGTATAGAA TTTGATGACT CTGAGGATAT TGATACGATA GGTGGATGGT TACAATCTCG 7260 

TAATACCAAT TTACAAAAAG ATGATTACGT GGATACAACT TATGATCGCT GGGTTGTTTC 7320 

AGAAATCGAT AACCACCAAA TTATTTGGGT GATATTAAAC TATGAATTTA ATGAAGCGAG 7380 

45 

ACCTACTATC GGACAGTCTG ATGAAGATGA AAAATCAGAA TAGATATTAA TATATAAACC 7440 

AACTAAGAAT GATTTAATTC ATTTTTGGTT GGTTATTTTT TTGACTAAAA TTAAnGAAAA 7500 

GTGAAAATAG TATTGGAACT CAATATCTTT AATGATTTAA TGAATAAnTT TTATTGAAAG 7560 

so 

CGA 7563 
(2) INFORMATION FOR SEQ ID NO: 34: 

SS 
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(A) LENGTH: 34 92 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

10 TTATATCAAC TTCATGGCGG AACCATTGAT GACCCATTAG ACGAAACAAT AAGCGCATTT 60 

SATGAATTGA AACAAGAAGG AATTATACGT GCTTACGGTA TTTCTTCTAT TCGCCCAAAT 120 

GTAATTGATT ATTATTTAAA ACATAGTCAA ATCGAAACGA TAATGTCTCA ATTCAATTTG 180 

15 ATTGATAATC GTCCAGAATC ATTATTAGAT GCAATTCACA ACAATGATGT TAAAGTATTG 240 

GCAAGAGGAC CTGTGTCTAA AGGATTATTA ACTTCAAACA GTGTTAATGT GCTCGACAAT 300 

AAATTTAAAG ATGGTATTTT TGATTATTCT CATGATGAAT TGGGTGAAAC AATAGCCTCT 360 

ATTAAAGAAA TTGAAAGTAA TTTATCTGCA TTGACATTTA GTTATTTAAC ATCACATGAC 420 

GTGCTTGGTT CCATCATTGT AGGTGCAAGT AGCGTCGACC AATTAAAAGA AAATATTGAA 480 

AACTATCATA CTAAAGTTAG TTTAGATCAG ATTAAAACAG CAAGAGCTCG TGTAAAGGAT 54 0 

TTGGAATATA CCAATCATTT AGTGTAGAAG TCATTTTCAG TAATAAAAAC AGCAGCATGA 600 

GGCGTTTCAT TATAAAAATG CCTTACTGCT GTTGTTTATG TACAATTCGC TATAATTTAT 660 

GATTATGATT ACTCACTTAT GATAGAAATT AAAGCGTTGT CCTCACGCAT CAGTATTTAG 720 

TAATTTCGCC TTGCGGCATT GCCTTAAGCA AACTTCTGCC ACTTCATCTC TTAATAATTT 78 0 

TATTAAAACA TCTTTCTATA TTTCACTTCG CATGTTGATT CATCATTATT AGTTATTATT 84 0 

55 TGTACACCCA GCACATTTCC TTGCAACACA AGTAGTTTGA ATTTTTCACA AGTATAATAT 900 

AATGTACCGT CTGAAATTTG GTCTACAGAA ATATCGCCTA AAATATCCAG CACTGTAAAT 96 0 

TCTTCAAATA CTGATAGTTG TTCCGCATAT CGTACACAAA GTCTTACCAC ACTCTCCGAT 1020 

40 

TGACAGTTCA TTGCCATCCC AC CTATTTAT GCTTTATTTT TAAATAATTT AGGGAAACAT 10 80 

CGTTCAAAAA ATCTAGGCGC AATTTGATAC ATTTTCAACG CATGaTGCAT CCATTTAGGC 114 0 

CGATTAATTT CCAATTGTTT TGTTTTAATG C CAT AAATGA TATCTTCTGC AAGCTGATTA 1200 

45 

GCATCAAGCA TAATTTCCCC CATCTTTTTA gCATACTTCA TTGATGGGTC GGCTTTTTGA 1260 

TGAAAAGGTG TATCAATCGG GCCAACATTA ACTGTCATGA TATGTAAGTT TGGTGACTCT 13 20 

60 AGTCTTAAAG CATTCATTAA TGCATAAAAC CCTGCTTTCG ATGCCCCATA ATGTGCAGCA 13 80 

TTTGCTTGTG TGGAAAATGC AGCTTGACTT GAAATACCTA CAATATGTGC GTTAGATGTT 144 0 

AAATATGGTC TCAACACAGT ATATAAAACA TTAAAACTAA TTAAATTAAG CTGATACGTT 1500 

£5 
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TAAATGAATC CATCGAATGA TGTATTGTCT TCAAATTGCA GTGCCTGTAT CGACTTCAAA 1620 

TCATTTAAGT CACAAGGAAT AACATTTATA GTTTTCCCCA ATTCCTGTTC AAAGATTCTA 1680 

5 

GTTGCTTTAT CAACATCACG CACCAACAAC GTTACATGCA CTTTATTTTC TAGTAACTTT 1740 

CGGACAATCG ATAAACCTAA ACCACTCGTA CCACCAGTCA CTATAAAATG TTGTCCTTTC 1800 

ATCAATTAAC CTTCCTTTTC AATTATATAG AATGCAATTT ATCAACTTTA CATAATTGAG i860 

10 

ACAAGTTGAT TATCTTTCCT AATATATATA CAATAATAAG AAAATATAAC ATACAAATCA 1920 

AAAACTAAAG GGATGTGaCG TTAATG r AAC TCGTATTTTA TGGAGCTGGT AATATGGCAC 1980 

1S AAGCTATATT TACAGGrATT ATTAACTCmA GCAACTTAGA TGCCAATGAT ATATATTTAA 2040 

CAAATAAATC TAATGAACAA GCTTTAAAAG CATTCGCTGA AAAACTAGGT GTTAACTATA 2100 

GTTATGAtGA TGCGACATTA TTAAAAGATG CAGAyTATGT ATTTTTAGGT ACCAAACCAC 2160 

20 ATGACTTTGA TGCTCTAGCA ACACGCATCA AACCACATAT TACAAAAGwC AATTGCTTCA 2220 

TTTCAATTAT GGCAGGTATT CCGATTGATT ATATTAAACA ACAATTAGAA TGCCAAAATC 2280 

CaGTTGCTAG AATTATGCCA AACACAAATG OGCAAGTTGG ACACTCTGTT ACTGGCATTA 2340 

25 

GTTTTTCAAA CAACTTTGAC CCTAAATCTA AAGATGAAAT TAACGATTTA GTTAAAGCAT 2400 

TTGGTTCTGT AATTGAAGTA TCAGAAGATC ATTTACATCA AGTAACAGCT ATCACCGGAA 24 60 

GCGGCCCAGC ATTTTTATAT CATGTATTCG AGCAATATGT TAAAGCTGGT aCsAAACTTG 2520 

30 

GTCTAGAAAA AGAACAAGTT GAAGAATCTA TACGCAACCT TATTATAGGT, ACAAGTAAGA 2580 

TGATTGAACG TTCAGAtTTG AGCATGGCTC AATTAAGAAA AAATATTACC TCTAAAGGTG 2640 

35 GTACGACACA AGCTGGCCTT GATACATTGT CACAATATGA TTTAGTATCT ATTTTCGAAG 2700 

ATTGTCTAAA CGCTGCCGTC GACCGTAGTA TTGAACTTTC TAATATAGAA GACCAATAAA 2760 

AACA5ACCCG CCAACACATG TATG CATCAT CGCAAGCACT GTGTTTGACG GGTTATTTTT 2820 

40 ATAATTTATT GTTATTTGGC AAGCATTGTT TATTACTTTG TCATTAGATT TTAAAACTAT 2B80 

CAAAATCTTT TACAAAATTA AAATTAGGTG TATCTTCATT TTGTATCAAT GTTTGATAAA 2940 

TTTCATTTAT ATCTTCTGTA TTATAGCGAT TGCTCAAATG TGTAATCAAC GTACGTTTAA 3000 

45 CATTGGCTTC TTTTATCAAT GCAAATACGT CTTCAATATG GCTATGATGA TAATTGTTGG 3060 

CTAAATGCTT TTCACCATCT ATATAGGTCG CTTCATGTAC CATCACATCA GCATCTCTAG 3120 

AAATCACACG TTCATTAGAA CATGGTTTTG TATCACCAAA AATTGCTACA ACTGGACCCT 3180 

SO 

GTTTGGACTC ACCTCTAAAA TCTTTTGATT GATAAACTTG ACCATTATGT TCAAATGTAT 324 0 

CATGAGATTT TACTTCTTGA TATTTAGGAC CTGGTTCAAG ACCAATGTTT TTTAACGCTT 33 00 
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CATQATTAAG T AAATGCG CC TCTACAGTAA AACCATCCAT GATGATATGT CAGATGATCA 3420 

TCGATTTCAA TATATGtAAT TGGATAGTTT AAATGTGACT CTGATAAATT CATAGACATT 3480 

5 TCCACATATG CT 3492 

<2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS : 
10 (A) LENGTH: 1973 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDBDNESS ; double 
(D) topology: linear 

15 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

ATCTAGCGGT ACAAGCGTCT TGGAGGCTAG TATGTTGAAC ATTGTAAACC CTGAAGATCA 60 

20 CTTCGTTGTC ATTGTTTCAG GTGCCTTTGG TAACCGATTT AAACAAATTG CACAAACTTA 120 

TTACAAAAAT GTGCATATTT ATGACGTAAC ATGGGGAGAA GCTGTAGATG TCAAAGATTT 180 

CATCAATTTC CTTTCAACTT TAAATGTTGA AGTTAAAGCA GTATTTAGTC AATATTGCGA 240 

AACATCTACG ACAGTGCTAC ACCCTATTCA CGAGTTAGGA AATGCCATTA ATCAATTTAA 300 

TAGTAATATT TATTTTGTAG TTGACGGCGT AAGTtGCATT GGTGCTGTTG ATGTTGACAT 360 

TAACAAAGAT AAAATTGATG TACTTGTTTC TGGTAGTCAA AAAGCAATTA TGTTACCTCC 420 

AGGATTAGCT TTTGTAGCTT ATAGCCACCG TGCAAAAGAA CATTTCAAAG AAGTAACTAC 4 80 

GCCAAAATTT TATCTAGACT TAAATAAATA CATTTCGTCA CAAGCTGACA ATTCTACACC 54 0 

GTTCACACCA AATGTGTCTT TATTTAGAGG TGTAAATGCA TACGTTGAAA CCGTAAAAGC 600 

AGAAGGTTTC AATCACGTAA TAGCACGACA CTATGCAATT AGAAATGCAT TAAGAAGCGC 660 

CTTAAAAGCA TTAGATTTAA CTTTATTAGT CAATGATAAA GATGCATCTC CAACGGTTAC 72 0 

40 AGCATTCAAA CCTAATACAA ATGATGAAGT GAAAATAATC mAAGATGAAC TTAAAAATnG 78 0 

CTTTAAAATA ACAATTGCnG GTGGTCAAGG CCATCTTAAA GGTCAAATTT TnAGAATTGG 840 

TCATATGGGG AAAATTAGTC CTTTCGATAT TTTATCGGTA GTATCTGCTT TAGAAATTAT 900 

45 TTTAACTGAA CACCGTAAAG TTAACTATAT CGGTAAAGGT ATATCAAAAT ATATGGAGGT 960 

TATTCATGAA GCAATTTAAT GTACTCGTTG CAGATCCCAT ATCAAAAGAT GGTATCAAAG 1020 

CATTATTAGA TCACGAACAA TTCAATGTAG ATATTCAAAC TGGCTTGTCC GAAGAAGCAT 1080 

SO 

TAATCAAAAT TATACCTTCA TACCATGCTT TAATCGTTCG TAGTCAAACT ACGGTTACTG 1140 

AAAATATCAT AAATGCTGCT GATTCTTTAA AAGTAATCGC ACGCGCCGGT GTTGGTGTAG 1200 

55 
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10 



15 



20 



25 



GTAATACGAT TTCAGCTACT GAACATACAC TGOCAATGTT ATTATCAATG 
TTCCGCAAGC ACACCAATCA CTTACAAATA AAGAATGGAA TCGAAATGCA 
CTGAGCTTTA TCATAAAACA TTAGGTGTCA TTGGTGCTGG TAGAATTGGT 
CTAAACGTGC GCAAAGTTTC GGAATGAAAA TACTAGCTTT TGACCCTTAC 
AAAAAGCAAA ATCTTTAAGC ATTACGAAGG CAACAGTTGA TGAGATTGCC 
ATTTCGTTAC ATTACATACA CCACTAACAC CTAAAACAAA AGGCTTAATT 
TTTTTGCCAA AGCAAAACCT AGTTTGCAAA TAATCAATCT GGCACGTCGT 
ATGAAAAGGC GCTAATAAAA GCATTAGACG AAGGACAAAT TAGTCGGGCA 
TGTTTGAACA TGAACCTGCA ACTGACTCGC CTCTTGTTGC ACATGATAAA 
CACCTCATTT GGGTGCTTCA ACAGTCGAAG CTCAAGAAAA AGTGGCAATT 
ATGAAATCAT CGAAATTTTA ATTGATGGTA CTGTAACGCA TGCAgTGAAT 
TGGACTTAAG CAATATAGAT GATACTGTAA AATCATTCAT CAATTTAAGC 
(2) INFORMATION FOR SBQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7620 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



GCACGAAATA 
TTTAAAGGTA 
TTAGGTGTTG 
TTAACGGATG 
CAACATTCTG 
AATGCTGTCT 
GGTATTATTG 
GCTATCGATG 
ATTATTGTTA 
TCTGTTTCAA 
GCACCTAAAA 
CAA 



1320 

1380 

1440 

1500 

1560 

1620 

1660 

1740 

1800 

I860 

1920 

1973 



35 



40 



<Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
GGTGTTTCAG ATGTCACTGG TTGATTTTTA ATTGTAGACG GGTATTTTGG 
TATTTATTTG CCGGCTTACT GTCAAAGCAT AGGAATACTA TCATAACAAT 
AAAT^AACAA AATAAAGAAG TACTAACAAA ATATTAAGAC CCATCGGCAT 
TCACTGTCAT AATAACTATC GATAATCTGT AATACTATAT AAAATATAAT 
GTCATAATCA TTGGAAATAA CATTGTTCTT GATATATOGT GAAATCTTCG 
GCTAAATTTG GAATAAACGT TGCCAAACTA TAGACAAAAG TATACACAGA 
ATCATCAATA TACTCATAAC TATTAATGTT TCGTTATCCG CCGCTATAGA 
AGAAATAGGT TTATTATTAG CACACACACA GCTGGAACCA TAAGTATCAA 
GCCATATACC AATATTCACT ACGTCTTGAT CTCCCCTTAA AATTTACATA 
AATAAAACGA ATGATTTCAT AAAACCTACT TGAGGTAATT GTTCCATTGT 
TCGTTAATCA TATTTATATT TTTAATTATT GTTACCGTTA TAATTTACAA 



GCTTTCGCCA 

TGTTAGGCCT 

TAATGTAAAA 

ACTGAATACT 

AACGCACAAC 

TGTAAGGATA 

AATAAAGAAT 

ATGCCATAGT 

ATTTTTCCAA 

AATCTCCCTT 

GATTCATTAT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
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GTAAAATGAA AACCCGCTAC AAGTACACAT CTATATGGAG ACTCATTTGA AAQTCAACOC 780 

TTCGTTAACT ATACTAAAAA TATGTCATAC TGCAATGTTC ACGTTTAAAA GAGTCTCAAT 840 

6 

CTATGCAAAT AAAATATTCC ATAACAAAGT ATATACTTTA CATTTTTATA ATTCTTAACA 900 

ATACTATTTT ATCAAACATT TACCACAATA AAAATATCTT TTTCATTTTT ATTTAAATTA 960 

ATCATATAAT TGCGAGGAGA ATATTATGOA TTTCGTTAAT AATGATACAA GACAAATTGC 1020 

10 

TAAAAACTTA TTAGGTGTCA AAGTGATTTA TCAGGATACC ACTCAAACGT ATACAGGCTA 1080 

CATCGTGGAA ACGGAAGCTT ACTTAGGTTT GAATGATCGT GCGGCTCATG GCTATGGCGG 1140 

1S TAAAATAACA CCTAAAGTCA CGTCATTATA TAAACGTGGT GGTACAATTT ATGCACATGT 1200 

CATGCATACG CATTTAGTCA TTAATTTTOT AACAAAATCT GAAGGTATAC CTGAAGGCGT 1260 

ACTTATCCGC GCAATTGAAC CAGAAGAAGG TTTATCCGCT ATGTTCCGTA ACAGAGGTAA 1320 

20 GAAAGGCTAC GAGGTAACGA ATGGCCCAGG AAAATGGACT AAGGCATTTA ACATTCCACG 1380 

GGCTATCGAT GGCGCTACGT TAAATGACTG TAGATTGTCT ATTGATACTA AGAAT CGTAA 1440 

ATATCCTAAA GATATTATTG CTAGTCCACG AATCGGTATT CCAAATAAAG GTGATTGOAC 1500 

26 ACATAAATCT TTACGTTACA CAGTGAAAGG TAATCCATTT GTGTCTCGCA TGCGTAAATC 1560 

AGATTGTATG TTTCCCGAAG ATACTTGGAA ATAAATGCCA TCTTTCATTG ATTACTATCA 162 0 

TGAAAATGAA ATCTATCTCC TTATAAGTCA ATCAATCGTG CCGTCAACAT GCGGATGGGT 16 80 

30 

TGATTGTTTT TCTTTGTATC CAT CAT ATTT TTTGATTCAT CTCCTCTTAT TGAACTTGTT 174 0 

CTTAATTATA AAATATAACA ATAGAATTAT TTATAATTAT TAAATTTAGA TG CATTAATA 1800 

TTATTGATAT TATTTTCAAA AACTAGAAAT ATTGATTTGT TGCATGTATA ATGTTAAAAG 1860 

35 

CGCCCTTTTA TAACGCTTAC ATATAAAAGC TTATTTAGGG AGAGGGATAT TCAACAAGGG 1920 

GGATTTGAAA ATGATAGAAC TTAATGCAAT TACAACATTA TGTTTAGCTT GTATCCTTTA 1980 

40 TTTACTTGGT AAGGCTATCG TTAATCACGT TAATTTTTTA AAACGTATTT GTATACCAGC 2040 

ACCAGTGATT GGCGGCTTAA TCTTTGCTAT TTTAGTTGCG GCTTTGGATT CATTTGGCAT 2100 

GGTTAAGATT AAATTAGATG CTTCATT CAT TCAAGATTTC TTCATGTTAG CATTCTTTAC 2160 

45 GACAATCGGT CTTGGTGCAT CATTGAAATT ATTTAAATTA GGTGGCAAAG TCTTGCTATT 2220 

ATACTTTATG TTTTGTGCTA TCATTTCAGT CATTCAAAAC ATAGTTGGTG TATCACTAGC 2280 

AAAAGTATTA AATATTAAAC CTTTGTTAGG ATTAACAGCA GGTTCCATGT CTATGGAAGG 2340 

SO 

CGGTCATGGT AATGCTGCTG CTTATGGTAA GACAATTCAA GATTTAGGTA TTGATTCGGC 2400 

ACTGACAGCG GCTCTTGCAG CTGCAACTTT AGGTCTTGTA TTTGGAGGGC TTATCGGTGG 2460 
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ATTTAAAGAT TATAGCCAAG TAGCATATAA 
TGAAGTATTC TTCATTCAAT TTACAATCGT 

5 

CAGTCATTTG TTTACAGCTC AAACAGGGAT 
TGTAGCTGTT ATTGTCCGAA ATATCTCTGA 
AATTACTAAT CAAATTGGCG ATGTCGCATT 

10 

CATTCAATTA ATCGAAATTT ATAAACTTGC 
AGTTGTCGTT ATGATTTTAT TTGCTGTTTT 

15 TGATGCTGCA GTAATGGTAG GTGGTTTTAT 

ATGGCAAATT TAGATGTTAT TACTAAAAAA 
GTACCTATTG TTGGTGCATT CTTAATCGAT 

20 ATACAATGGT TTAGTTAAAC ACCAAACTCA 

TTTATTTATC CTCGATGTAT ATTCAAGTTA 

CTAAATACGA TTTGTTTTTG TGTTAAGTCG 

25 

ATTGATTTCA TGTGTTCAAT AAATGATTCT 

TTTTCAACTT GATTTAAAAA CGGACGTGAC 

CTTTTAATTG CATCGAGTGG TGTACGTAAA 

30 

TGATAAGCCG TTGTTTCAAG TAATGACTCA 
TCCATATCTT TATTTGCACG ACGTTCATTT 

35 CCACTAACAT CGACATACTT GACGCCTATT 

ATACCAAATC CAACTTCTTT TATAATGACT 
ATA'CtATCTA ACCAAGTCAC AAATTCACGA 

40 GAATTAACAT GGATTTGTAA CGCTTGTGCC 

TCTACTGGTA CGTCCGCACC AACATTGCTA 

GCAATCGTAA ACGTCTCAGC CATGCGTGGA 

45 

GCCATCGCTA AGCCAGTTTC TCTTGCAACT 

CACTCGCTAC CACCCGTCAT TGCATTAATA 

GTCTGTGATG TCAAATCGAT ATCATTTACA 

50 

CGCATCTTAT CAAAATCTGA ATGCATTGCG 

TCATTTTTTC TCTGTTCTCT TTGAAAATCA 

55 



CGAACATTTA CATAGTAAAT TTAATGCCAC 2580 

TGTATTCTGT ATGGCAGTTG GAAGTTATTT 2640 

TAATGTTCCA ATTTACGTTG GCTCATTATT 2700 

AAGTTTTAAT TTTAATATTG TAGATTTAAA 2760 

AGGTATTTTC TTATCTCTTG CGCTAATGAG 2 B20 

TATACCTCTT ATTATTATCG TTTTAGTTCA 2880 

AATTTTATTT AGAGGTTTAG GAAAAGATTA 2940 

CGGTCATGGG CTTGGTGCAc GCCAAATGCC 3000 

TATGGAAACT CACCTAAAGC ATATTTAGTT 3060 

TTAATTGGTG TTATAGTCAT TATGGGATTC 3120 

TAAATAAAAG AGGAGGCCTT CGCCTCcTcT 3180 

CGTTGTTCTA TCCATGACAA TATTTCCGGA 3240 

TCAATATTTT TAGCATCTAA CATCGTCATT 3300 

ACATAAGCTA CTGTATGTGC AATGCCATTA 3360 

ATACCAGTTG CCTTTGCACC AAGTGCTAAA 3420 

CCACCACTCG CGAAAACTGA AATTTCGCTT 34 80 

ACTGTAGACT GTCCCCATGA TGATAAGTAA 3540 

TCAATATCTA CAAAGTTAGT ACCACCTTTG 3600 

TGTTGTAAGT CATGCATTAA TTCTTTGCTC 3660 

GGAACAGACA CTCGTGATAC AATCGACGCT 3720 

TTCCCTTCAG GCATAACTAA TTCTTGAGGA 37 80 

TCAAGTAATT CAACTGCTTC CAAAGCCTTT 3840 

AAAATCATGC CTTCAGGATT CATTTTTCGC 3900 

TTTCTCAATG CCGCATGTGT TGATCCAACT 3960 

ACAGCTAGCT TTTCATTGAT GTTTTTCGTC 4020 

TAAACCGGAT ATGCCATCGT TAAGTCAGGC 4080 

TTAATTGATG GGATAGAATG ATGCACAAAA 414 0 

TCAGATTGGG CCATTGCTAT TTCAACATGT 4200 

CTCATGATTA AACCTACCTT TTCGTCATTT 42 60 
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ATTACAGCTA AGCAAATATA ATATCCATAA TGTAAATGTA ATGCCGGCAT ATTTACAAAG 4380 

TTCATAC CAT AAATCCCAGC TATGAATGTT AACGGTGAAA ATATAACTGA TACTAATGTC 4440 

5 

AGTACTTGCA TAATACTATT CATTCTAAAT GACGTGTATG ACTCAAAATT TTCTCGTATT 4 500 

TCGTTTGTCA TTTCTTGAGC AGTACGAATG ATATTAOGTT GCTTAATCAA GTGGTCATCG 4560 

1Q ATATGTTGAA TGTATAGCGA ATGTTTATTA TCTATAATCA AATCACCATT TTGTTTCATT 4620 

GTATCAATTA GCTCTTGCAT AGGAAACAGT ACACGTTTTA CTTTAATCAA ATCCGAACGT 4680 

AACTTAAAGA CACTATCCAT GACCATTTTA TTAAAGCGAT CATCTACATG GCGGTCTTCA 4 74 0 

IS AAATGATAAA CACTATCTTC AAGTGCATAT ACAAAGTTGA AATATTTATC AACCATCATA 4 800 

TCTAAAATTA ATATGACGAC ATCTGCACAA TCTAATTCTG CATCTAATGT ATTCATATAC 4 860 

TTATAGACTA CTTTATTTAA TGATTCCAAC GTTTGATGAT GATATGTTAC TAATACATTG 4 920 

20 TCTTGTATAA AAATATTTAG TGCTATTGGT GAATAGTTTG ACCCCATAAT ACTATGGAAT 4 980 

ACTAAGTATT GATAATCTTT ATAAGATTTA TATTTAGCTC GTGGCATACC GTTAATTGCA 5040 

TCATCCACTT CTAAATCATT AAAATTAAAA TGTGCTTTAA ACCATTCATT TTCTTGTTCA 5100 

25 

TTCGGTTCAT CAAAATCATA CCAAACAATA GTCGCATCTT TTGGTATCTC TTTGATATCA 5160 

TCAACTACTT TAAACGGTTC ATATGTAGTT TGATACCGTA TCTTTAAAGC CATCGATACT 5220 

CCCCCTAAAT AACGAATTCT CTATTATTTT ATCATGAATT AAATAACGTG TATGTCTTAA 5280 

30 

TTTATTTTAG TATGATAGTC ACTAAGGAGA TGGTTATTAT CAAACAACTT TTTACACATA 534 0 

CTCAAACCGT AACATCTGAA TTCATTG AC C ATAACAATCA TATGCATGAT GCAAATTATA 5400 

35 AT AT CATTTT TAGTGACGTC GTGAATCGTT TTAATTACAG CCACGGTCTT TCTTTAAAAG 54 6 0 

AACGCGAAAA TTTAGCATAT ACGCTATTTA CACTAGAAGA ACATACGACA TACCTCTCAG 5520 

AATTGTCTCT TGGCGATGTA TTTACTGTTA CTTTATATAT TTATGATTAC GATTATAAGC 558 0 

40 GGTTGCATTT ATTTTTAACA TTAACTAAAG AAGATGGTAC ACT AG CAT CA ACAAATGAAG 5 640 

TAATGATGAT GGGAATTAAT CAGCACACAC GTCGTTCTGA TGCTTTTCCT GAAT CATTTT 5700 

CAACACAAAT AGCACACTAT TATAAAAATC AATCAACTAT CACTTGGCCT GAACAATTAG 5760 

45 

GACATAAAAT AGCAATTCCA CACAAAGGAG CATTAAAATG ACAGATGCAT TACAACAAAA 5820 

GATTCATATC GAATTACTAG ATTTATTAGA TGATGTTAAG TTTGAATTAA CAGAATTAAA 58 80 

TGCACAAAAA GGGTTATACA TTAACGGACC AGCAAATCAG CTACTTAAGC GTGGCGTGCA 594 0 

SO 

TATGGCTTAT GTTCAAGGAC AAAAGCAAGC CATCGATAAT ATTATGACTA TTGTGGAACA 6000 

ACAGCTTGAA AGATCAACAT TTCCTAGAAC ATTATGATAA ATTTCAAAAT GAGGTTGCTC 6060 
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ATAATTTTTT AGATCAATTT TATCAAATTA AAGGGCAATA CTTTATCATC ACACATATCA 6180 

ATACACTTAT TGGTGATTTT CACTCAGAAG CTCATTAACA ATTAGTCTAT ATAACCCTTG 6240 

CTATATTTTC AAAAACAAAA CCCAATTACG TTTTCATGTC AAATATCATC TTGCATGAAA 6300 

TCGTAACTGG GTCATTTATA TGTTATTAGT TATTTTGTGT TACATCCTCA TCTATCGATT 6360 

TGGCAATTTG TTTAATAGCT TTATGTGATT GTCTAATTGG ATAAATTGGA AAATCATGTA 6420 

CCATCTTAGG ATAATCATAA AACTCAATGT ATTGATGATG TTGCAACATC ATTTGTTCAA 6480 

ATAGCTTCAT ATCAGGATGT GTCATTTCAC GTCCACCACC AAACATATAA ACTGGTGGCA 6540 

ATCCTTCTAT TGTGCCATTA ATTGGCGATA TGCGCTTATC TGTTAATGGT AGGCCATTCG 6600 

CCCATTTTTT CATAATCTCA TTGACACCAA ACTGACTTAG aACCGCATCT TGTTCGATTA 6660 

AGGCGTCCGA AATATCTTTA TTAGATAGTG TTGCATCTAA AATTGGTGAG ATTAAATACA 6720 

ATTTATTCGG TAATGGCTGT TGATTAkCTA AAAGAGATTG TACAAAGGAT AATGCCAGTG 6780 

CACCACCTGA ACCATCACCC ATGACTACGA CATTTTGATG TCCTACTTCA GATACTAATT 6840 

GaTCATAAAC ACGTTGTATC GCTTGGnAAA GTATCGTCaA TATGnAAACT CTGGTGTCTT 6900 

TGGATAGATA GGCAGTACAA CCTCATATAA TGtACTTAAA GTGATTTTAT CCCAACAATC 6 960 

TCCAATGGAA CGGTGATGGT TGTAGTGCAT TGAATCCACC GTGAATATAT AAAATTTTCT 7020 

TATCAATTTG ATGTCTGAAA TTAAAGCGAA AGACTTGCAT ATCATCTAAT GACAATTTTT 7080 

CTAAATTTGC TTTAACATTT AATGTTGAAG GCTGCTTATG TTTTTTTCTA TTTTCAATTT 7140 

CTCTTTTATA AAAAAATCTT TCAACATCTT GATCATTTTT AAACATAATC GAGCGATTGT 7200 

GAAGCAAATA TTTATTGACA ACGCTATTCA TAACACGGTT TCTAATCAAT GTCTTAACCT 7260 

ACCTTTATAT ATTTTATGTA TCCAATGATk GTCTATCCCC TACATTCTTT GCCAAAAAAA 7320 

GTAXATAATG TAGAAGATAT TTTCTTTTTC ACTTTCAAAT TTAAGACTAC AATTGAACAG 73 80 

TGATTTTTCA TCATTATAAC AGACAACTAG ACATATTGAT AAGTAAAGAA AAGAACTTTA 744 0 

TACGGAGGTA CCTTGCATGA CAAATCCAAA TCAACGATTA GAACCATTTG ATGAGACATT 7500 

TCAACAACCG AATATTCATC GTGGTAAGCG ATATGGTAAG AAAAAACGTT CATTGGTAAG 7560 

CATGATTATT CAAATCATTG TTGTwATATT AACCACCATC GCTGGAATAC AGCATGGTGG 7620 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9834 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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<xi) 


SEQUENCE DESCRIPTION: 


SEO ID NO- 


37 : 








GTCATtACCG 


amTTTCtTAG 


AaTCATTTAA 


AGATGATAAA 


TATACAAACG 


TTGGTAATTT 

A X Ww X X A X 


€ 0 


5 


AAAAGAAGTG 


AATTTTGATA 


AAATTGCTGC 


GACGAAACCC 


G AAGT AAT P?T 


X X X ^ X w X Mw 






ACGTACAGCT 


AATCAAAAGA 


ATTTAGATGA 


ATTCAAAAAA 


GCTGCACCTA 

VJV, XV^VboTW^w X f\ 




i On 


10 


TGTTTATGTT 


GGTGCAGATG 


AAAAGAACTT 


AATTGGTTCA 


ATGAAACAJVA 


A PA iTTY? AA A A 


*^ V 


TATCGGAAAA 


ATTTACGATA 


AAGAAGATAA 


AnrrAAAPiAA 


TTAAATAAAfi 


ATTTAftATAA 


rt A 




CAAAATTfSPT 


TCAATGAAAG 


ATAAAACGAA 






TGTATTTACT 


360 


IS 


X 4/v\V.vnn 


GGTGAATTAT 


CAACATTTGG 




CGTTTTGGTG 


GATT AG TTTA 


420 




cxz ata raTTa 


GGATTCAATG 


CAGTTGATAA 


?V JV TV T\./** , T , TA *pfp 

AAAAuiAAu I 


AA 1 AljCAA x A- 


ATGGACAAAA 


460 




X \a liiUl AAL 


GAATATGTTA 


ATAAAGAAAA 


TCCAGATGTT 


ATTTTAGCGA 


TGGATAGAGG 


S40 


20 


x LAAt? UUATA 


AGTGGTAAAT 


CAACTGCGAA 


ACAAGCATTA 


AAT AAT CCTG 


rn m f gift ■ » » « m -m m 

TATTAAAAAA 


600 




•PPTTl TV TV <*■*/""• TV 


ATTAAAGAAG 


ACAAAGTATA 


TAATTTAGAT 


C CT AAATT AT 


GGTACTTTGC 


660 




AGCTGGATCA 


ACTACAACTA 


CAATTAAACA 


AATTGAGGAA 


CTTGATAAAG 


TTGTAAAATA 


720 


25 


ATTTTAAAAG 


AGGGGAACAA 


TGGTTAAAGG 


TCTTAATCAT 


TG CTC C CCTC 


TTTTCTTTAA 


780 




IV »V TV 7V^"t/*» K K k rri 

AAAAGGAAAT 


CTGGGACGTC 


AATCAATGTC 


CTAGACTCTA 


AAATGTTCTG 


TTGTCAGTCG 


840 




TTGGTTGAAT 


GAACATGTAC 


TTGTAACAAG 


TTCATTTCAA 


TACTAGTGGG 


CTCCAAACAT 


900 


30 


7\ TV T\ j\ TV'I—IT* 


GATTTTCAAT 


TTCTACTGAC 


AATG CAAGTT 


GGCGGGGCCC 


AAACATAGAG 


960 




AATTTCAAAA 


AGGAATTCTA 


CAGAAGTGGT 


GCTTTATCAT 


GTCTGACCCA 


CTCCCTATAA 


1020 


35 


TGTTTTGACT 


ATGTTGTTTA 


AATTTCAAAA 


TAAATATGAT 


AGTGATATTT 


ACAGCGATTG 


1080 


TT A A & P CTl A fl 


ATTGGCAATT 


TGGACAACGC 




TAT ATT CATT 


GATTGTTAAT 


1140 




TPl^Tfl' 1 ™ I " I P* 


ATACACCGCA 


TAAGATTGCT 


TTTTCGTTAA 


7\ tv™> j\ »|ip 

A iuAAUu^ 1 C 


AGAC CAACGC 


1200 


40 


TT A ATPV^mT 
x. a nn x vtvjv, vj x 


GCTTTTCAAA 


CTCATTATGG 




ATW3ATA - TA 


T TT ATT A CAA 


1260 




PATTTA A A TT 


TAATAGCAAT 


AATATCTTCT 


TTY5T3TAA A AT 
X V— X nnnn X 


A ATY^nrY3 A C A 


S Cy iu ill tJlA 


1320 




GTATCGATTA 


*\ X UiviLLAlA 




ATAGACAAAG 




TAPflATTPPT 

J. rW^\jf\ x XV_v«X 


i *3 q n 


45 


TTGGATGTTC 


ACCAATAATG 


CGAACTTCAC 


GATTTAATTC 


AATGCCAAAT 


TTTTCTTTGA 


1440 




CGGTCTTTTG 


TACATAATGA 


ATAAGGTTTT 


CATAATCTGT 


AGCAGTTCCA 


TTGTCTACAT 


1500 




TTACCATAAA 


ACCAGCGTGT 


TTGGTTGAAA 


CTTCAACGCC 


GCCAATACGG 


TGACCTTGCA 


15S0 


50 


AATTAGAATC 


TTGTATCAAT 


TTACCTGCAA 


AATGACCAGG 


CGGTCTTTGG 


AATACACTAC 


1620 




CACATGAAGG 


ATACTCTAAA 


GGTTGTTTAG 


ATTCTCTACG 


TTCTGTTAAA 


TCATCCATTT 


1680 
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AGTGTTCTTT TTGAATAATG CTATTACGAT AATCTAACTC TAATTCTTTT GTTGTAAGTT 1800 

TAATTAACGA GCCTTGTTCG TTTAOGCAAA GCGCATAGTC TATACAATCT TTAACTTCGC 1860 

CACCATAAGC GCCAGCATTC ATATACACTG CACCACCAAT TGAACCTGGA ATACCACATG 1920 

CAAATTCAAG GCCAGTAAGT GCGTAATCAC GAGCAACACG TGAGACATCA ATAATTGCAG 1980 

CGCCGCTACC GGCTATTATC GCATCATCAG ATACTTCGAT ATGATCTAGT GATAATAAAC 2040 

TAATTACAAT ACCGCGAATA CCACCTTCAC GGATAATAAT ATTTGAGCCA TTTCCTAAAT 2100 

ATGTAACAGG AATCTCATTT TGaTAGGCAT ATTTAACAAC TGCTTGTACT TCTTCATTTT 2160 

TAGTAGGGGT AATGTAAAAG TCGGCATTAC CACCTGTTTT AGTATAAGTG TATCGTTTTA 2220 

AAGGTTCATC AACTTTAATT TTTTCATTTG GGATAAGTTG TTGTAAAGCT TGATAGATGT 2280 

CTTTATTTAT CACTTCTCAG TACATCCTTT CTCATGTCTT TAATATCATA TAGTATTATA 2340 

CCAATTTTAA AATTCATTTG CGAAAATTGA AAAGAAAGTA TTAGAATTAG TATAATTATA 2400 

AAATACGGCA TTATTGTCGT TATAAGTATT TTTTACATAG TTTTTCAAAG TATTGTTGCT 2460 

TTTGCATCTC ATATTGTCTA ATTGTTAAGC TATGTTGCAA TATTTGGTGT TTTTTTGTAT 2520 

25 TGAATTGCAA AG CAATATCA TCATTAGTTG ATAAGAGGTA ATCAAGTGCA AGATAAGATT 2580 

CAAATGTTTG GGTATTCATT TGAATGATAT GTAGACGCAC CTGTTGTTTT AGTTCATGAA 2640 

AATTGTTAAA CTTCGCCATC ATAACTTTCT TAGTATATTT ATGATGCAAA CGATAAAACC 2700 

30 CTACATAATT TAAGCGTTTT TCATCTAAGG ATGTAATATC ATGGAAATTT TCTACACCTA 2760 

CTAAAATATC TAAAATTGGC TCTGTTGAAT ATTTAAAATG aTGctACCGC CAATATGTTT 2820 

TGTATATTTT ACTGGGCTGT CTAAGAGGTT GAATAATAAT GATTCAATTT CAGTGTATTG 2880 

35 

TGATTGAAAA CAATTAGTTA AATCACTATT AATGAATGGT TGAACATTTG AATACATGAT 2940 

AAACTcCTTT GATATTGAAA ATTAATTTAA TCACGATAAA GTCTGGAATA CTATAACATA 3000 

ATTCATTTTC ATAATAAACA TGTTTTTGTA TAATGAATCT GTTAAGGAGT GCAATCATGA 3060 

40 

AAAAAATTGT TATTATCGCT GTTTTAGCGA TTTTATTTGT AGTAATAAGT GCTTGTGGTA 3120 

ATAAAGAAAA AGAGGCACAA CATCAATTTA CTAAGCAATT TAAAGATGTT GAGCAAAAAC 3180 

45 AAAAAGAATT ACAACATGTC ATGGATAATA TACATTTGAA AGAAATTGAT CATCTAAGTA 3240 

AAACTGATAC AACTGATAAA AATAGTAAAG AATTTAAGGC ACTACAAGAA GATGTTAAAA 3300 

ACCATCTCAT ACCTAAATTT GAAGCATATT ATAAGTCAGC AAAAAATTTG CCTGATGATA 3360 

50 CAATGAAAGT TAAGAAATTA AAAAAAGAAT ATATGACGCT TGCAAATGAG AAGAAGGATG 3420 

CGATATATCA ATTAAAAAAA TTCATAGGTT TATGTAATCA ATCTATCAAG TATAACGAAG 34 80 
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AATTAGCTGA TAATAAAAGT GAAGCAACTA ATCTTACGAC AAAATTAGAA CATAATAATA 3600 

AAGCGTTAAG AGATACTGCG AAGAAGAACC TAGATGATAG TAAAGAAAAT GAAGTAAAAG 3660 

GCGCGATTAA AAATCACATT ATGCCAATGA TTGAAAAGCA AATTACCGAT ATTAACCAAA 3720 

CTAATATTAG TGATAAGCAT GTTAATAATG CAAGGAAAAA CGCAATAGAA ATGTATTACA 3780 

GTCTGCAGAA CTATTATAAT ACACGTATTG AAACAATAAA GGTTAGTGAG AAGTTATCAm 384 0 

AAGTCGATGT AGATAAGTTG CCGAAAAAGG GTATAGATAT AACTCACGGC GATAAAGCCT 3 900 

TTGAAAAAAA GCTTGAAAAA TTAGAAGAAA AATAACTATA ATCATTTTTC AAAGTTAAAA 3960 

ATTTTGAATT TATGGTTAAC ATGTCAACTT ACTATGTGTA TAATGGTAAA CATTGATATT 4020 

AACTATATGT ATAAAAATGT CACGCAGATG CTATTTAAAT GTGATAAATA TTTTTAGAGG 4080 

TGAATAGAGT GG CTATAAAG CTAAGTTCAA TTGACCAATT TGAACAGGTT ATTGAGGAAA 4140 

20 ATAAATATGT TTTTGTATTA AAACATAGTG AAACTTGTCC AATATCGGCA AATGCGTACG 4200 

ATCAATTTAA TAAATTTTTA TATGAACGCG ATATGGACGG TTATTATTTG ATTGTCCAAC 4260 

AAGAACGCGA TTTGTCAGAT T AT ATTG CT A AAAAAACGAA CGTTAAACAT GAATCACCTC 4320 

25 AAGCATTTTA TTTTGTAAAT GGTGAAATGG TTTGGAATCG AGAC CACGGT GAT AT CAATG 43 80 

TGTCGTCATT AGCACAAGCA GAAGAATAAT GAAACTATAG GGTTGGAACA TTTTGCCTTA 444 0 

CACTAGTAGA CGTGAATAGC ACAACTTAAA TTCGTGTGAA TCAGAGTAGT TTGGCTATAA 4500 

TGATGTTCTG ACCTTTTATT TTATGTCACC TTTAGAAGCA GTTAAGTTAG TACTTTTTTA 4560 

CAAACATATG TATAATATAT TCGAGTATTT TTATTGAAAa tATTTTGGAA AACGACGAAT 4 62 0 

CCAATAAGAA AATTTAAACA TGATTTGTAA GTTAGTTTAA TAGGAAATAT ATGCTAAACC 468 0 

AAAAGAAGCA TATTGTTATT TACTGGAATA ATTAATAATC ATGTCATGTT AAATGTTAGC 474 0 

ATATAATCAC GAGATAAAAT CTAAAATTTA AGATTAATCT TTTATGAATA AAAAACGTAT 4800 

40 CACAACAAAT AATAAAGTAA GGTGGTCAAG GTTATGAAAG TATTAGTAGC CATGGATGAG 4860 

TTTCATGGAA TTATTTCAAG TTATCAAGCT AATAGATATG TTGAAGAGGC AGTTGCAAGC 4920 

CAAATTGAAA CTGCAGATGT AGTTCAAGTA CCATTGTTTA ATGGAAGACA TGAATTATTA 4980 

45 GATTCTGTAT TTTTATGGcm ATCTGGGcaA AAGTATCGTA TACCAGTACA TGATGCAGAT 504 0 

ATGAATGAAG TTGAAGGTGT TTACGGACAA ACTGATACAG GGATGACCGT TATCGAGGGG 5100 

AATTTATTTT TAAAAGGTAA AAAACCAATT GTTGAACGAA CAAGTTATGG TTTAGGAGAA 5160 

ATGATTAAAC ATGCATTAGA TAAOGACGCA AAACATGTTG TAATTTCACT AGGTGGGATT 5220 

GATAGTTTTG ATGCTGGTGC AGGTATGTTA CAAGCATTAG GTGCTCAATT CTATGATGAC 52 80 
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GATATGTCGA ACTTACACCC TAAAATGGAA ACAGCAAGAA TTCAAGTAAT GTCGGATTTT 5400 

TCAAGTCGAT TATATGGTAA GCAAAGTGAA ATCATGCAAA CTTATGATGC GCATCAGTTG 5460 

5 AATCATAATC AAGCAGCAGA AATCGATAAT TTAATTTGGT ATTTTAGTGA GTTATTTAAA 5520 

AGTGAATTGA AAATTGCAAT TGGTCCAGTT GAACGTGGTG GTGCTGGTGG TGGAATTGCA 5580 

GCAGTCTTGA ATGGACTGTA TCAAGCTGAA ATATTAACCA GTCATGCATT AGTAGACCAA 564 0 

10 

CTAACACATT TAGAAAATTT AGTTGAACAA GCGGATTTAA TTATTTTTGG AGAAGGATTA 5700 

AATGAAAATG ATCAGTTGCT AGAAACGACA ACATTGCGTA TTGCAGAACT TTGTCATAAA 5760 

CATCAAAAGG TTGCCATTGC AATTTGTGCA ACTGCTGAAA AGTTTGATTT ATTTGAATCA 5820 

IS 

CAAGGGGTTA CAGCAATGTT TAATACATTT ATCGATATGC CAGAAACTTA TACTGACTTT 5880 

AAAATGGGtT ACAAATTAGG CATTATACGG TTCAGTCTTT AAAACTGTTG AAAACACATT 5940 

2Q TTAATGTTGA GGTTTAGTAA AGAAGGACTA AATTGGTGAT GCTGTCATGA TGGTTAATAA 6000 

CATTTATGAT GGTTAGCAAA ACGAATTAGA AGATCGAAAG TATACGTAAA AAATATGAAA 6060 

AATCACGCTA TCATTGCACT GAATGTTAGC GTGATTTTTA TATATTAATT AAGCCTGAGT 6120 

25 TGAACTAGTA TATAATCGTT GGTTTTTAGT GATTTTCAGC GATATCTTCT ACAATTCCAA 6160 

TGATTACTTG TACTGCTTTT TCCaTAACAT CAATGGATGC aTATTCATAT GGGCCGTGGA 6240 

AGTTACCGCA ACCTGTAAAG ATGTTTGGAG TTGGTAACCC CATAAATGAC AATTGTGAAC 6300 

30 CATCTGTACC ACCGCGAATA GGTTCAGTGT TTGCTGGAAT ATCTAATTTG GCAAAGACAC 6360 

GTTTAGGTAT ATCAATAATA TGAGGCAATG GTAATATTTT TTCTGCCATA TTGAAATATT 6420 

GATCCGATAT ATCAACTTTA ACTGGATAAT TTTCAAAATG GGCATTGATA TCGTCACGTA 64 80 

35 

TTTCTAAAAT ACGTTTC TTA CGCAATTCGA ATTGTTTTTT ATCATGATCA CGAATAATGT 6540 

ATTGCAAAGT TGCTTTTTCA ACAGTTCCTT CAAAGTTCAT TAAGTGATAA AAGCCTTCGT 6600 

ATC CTTCTGT TCGCTCCGGA ACTTCACTAT CAGGTAGCAA ACTATCGAAT TGTTCACCTA 6660 

40 

AACGTATTGC GTTTACCATT GCATTTTTAG CTGAACCAGG ATGAACATTT ACACCGTGGC 6720 

ATGTAATAAC CGCTTCAGCA GCGTTAAAGC TTTCATATTG TAATTCTCCA TATTGACTAC 6760 

4£ CATCCATAGT ATAAGCAAAA TCAGCATTGA AGCGGTCAAC ATCAAATTTA TGTGGACCAC 6840 

GACCGATTTC TTCGTCTGGT GTAAATCCAA TGCGAATGGT ACCATGTTTA ATTTCTGGAT 6900 

GTTCTTGTAA ATAACAAATA GCTTCCATAA TTTCCACAAT ACCCGCTTTA TCGTCTGCAC 6960 

SO CTAGTAACGA TGTACCATCA GTTACCATTA ATGTATGACC AACTAAACTG TTAAGTTCTG 7020 

"GAAATACTTT AGGATCTAAG ACACGTTTAG TATTGCCTAG TTTGTATGGC TTACCATCAT 7080 
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GCGCCAAAAA TCCAACTGTT GGGACGTCGA CATCGATGTT ACTTTCTAAT GTAGCAAATA 7200 

AGTAGCCATT TTCATCTAAA TCAGTTGGCA ATCCTAATTG TTGTAATTCT TTTTCTAATA 7260 

AATGTAACAA ATCCCATTGC TTTTCAGTTG AAGGTGTTGT TGTAGATTTT GGATCAGATT 7320 

GCGTATCAAT TGTCGTATAT CTTGTTAATC TATCTATCAA TTGGTTCTTC ATTATATTCG 7380 

ACCCCTTAAA CT CT ATT ATT CATGTTGTAA GATTTTTTAT ATGTCTTACC TTTGATTTTA 7440 

CCATACAGTT GTTTGATACG TGTGTATAGG TAATATAGAA TTTCAGAAAC TAATATACCG 7S00 

AAAGCAATCG CACCTGAAAT CAGTGTAcTT CTAAAAATGT ATTTACAGCA CTTGTATAAT 7560 

CATTTGATAC TAAAAAACGA GTCGCTTGAT AAGCTGCACC ACCAGGTACT AATGGTATAA 7620 

TGCCTGGCAC TATGAATATA ATTACCGGTC GTTTATATCT GCGACTCATA GTATGACTCA 7660 

TTAAGCCTAA AATTAAGCTT CCCAAAAATG AAGCGCCAAC TTTTCCAAAC TCTAAATCTA 7740 

CCGTTAATTG GTAAATCGTC CATGCAATGG CACCCACAAA TCCACATGCT ACT AAGAGG C 7800 

GTTTGGGTGC ATTGAAAATG ATAGAGAAAA GTACTGTTGA TATAAAGCTG ATTGTAAAAT 7860 

GAAATAAATA AAATAGCATG CTTTAACAGT CCTTCCTTAA ATGATTAATA AAACGATTGC 7920 

26 GACACCAGCA CCGATTGCGA ATGCTGTTAA TGCAGCTTCA ACACCGOGAG ACATACCTGC 7980 

AAGTAATTCA CCCGCTAATA AATCTCGAAT GGCATTGGTA ATTAATATAC CAGGGACAAG 804 0 

TGGCATGACA CTGGCTATAG TAATGATATC TTGATTGGTT GCAATGCCTA ATTTAGTAAA 8100 

30 TGTGGCTGCA ATGGATATGA CCACAGCGGC TGCAACAAAC TCTGAGAAAA ATTTAATTTG 8160 

TATATAGCGT tGCACAAAGC TGAATGTTAA AAATGCGGAT CCGCCAGCAA TGACTGCAAT 8220 

CCAACAATCT GATG CGACAC CACCAAACAT AAATAGGAAG AAGCCACATG CAATGGCAGC 8280 

TGCAAAGAAA TTCGTTAAAA AAGAATATTG TAATGATGCA TGCTGTAAAT GAATAAATTC 834 0 

AGATTTAGCT TCATCAATTG TGAGTTCTTT ATTTGATATT TTACGTGAAA GACTATTCGT 84 00 

TAAAGCGATT TTCTCTAAAT CTGTTGTACG CTCTTGTACA CGAATTAATC TTGTACTTGT 8460 

TCGATCGTTT AATGAAAAAA TAATTGCAGT TGAACTGACA AAACTATATG TATTATGAAG 8520 

ACCATAACTA TGTGCGATAC GGTTCATTGT ATCTTCAACT CGATATGTTT CAGCACCTGA 8580 

4S TTCaAGTAAA ATTCTACCTG CAATTAATAC AACATCAATC ACTTTGTTTT CATCTATAAT 864 0 

TGTGATTGAA TCTGGCATAT CAATTCACCT CCAATGATAT GTGTTATTTA TTTGAACAAT 8700 

TGaAGTTTAC AACTTGTTGT TACAACTTTC AATAGTGAGA CTTTGTGTTA GTATGATGAA 8760 

50 CTTGTATGGT TCAAATTTAA ATAAGAAAAA CTGTTAATCT TTGCTATTAT ACTATGATTT 8820 

AATAATAGCA AAGGATTAAC AGTTTTGTCG TTGTTATAAA TTGATAATAG GGTTAAACAT 8880 



55 
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TTTACGCTGT GATTTTGGAT CGTCATCTGT TAAATAACCA ACACCGATAG ACACTGACAA 9000 

TTTAATAACT TCTTTGTTTG GTAAATGGAA TGATGATTTT TCAACACCCG AACGAATATT 9060 

5 TTCAGCTAAT TTAACACTTT GATCAAGTGA ATAATTGTGA ATGACAACTG AGAACTCTTC 9120 

GCCACCATTT CTAAAAATTT TAAATTGATT CGGCACATAG TTTTTAAGTA ATTGAGACAT 9180 

TTGTTTTAAT ACAGCATCAC CTGATTTGTG TGAGTAGGTA TCATTGaCAT CTTTAAATCC 9240 

10 

ATCGATATCG ATTAATAATA ATGCGATACT TTGATGTTCT TTTTCAGCTT TTCGTGAAAT 9300 

TTCATTTAAA TGTCTATCAA ATTCTTTTAC ATTACCTAAG CCTGTTAAGT AATCATATTT 9360 

ATCTTOGTTT TCATAACGAT TTACGAGTGA GAAGAAATGC CAAATATCGA CAAATGTTAT 9420 

15 

CGCTGAAGCT AAAGTGATAA TTAATGAAAT TGGTATTAAA ATGATAACTT CCGATAGTGT 9480 

GTAAATAGGA CTCACTAACG CGACACCAAA TAAAATGATT ATTGTAACAA CATTAAGTAT 9540 

20 TAATAATGAT AGCACATCAT TTTGTTTTAA AAATGGTCCA AT AG CACTTG TTACTGCAGC 9600 

AATAACAATC AACGTAACAC CGTACATAAT CGAGTTGTTA AATACTACAA TTTCAACAAT 9660 

TGCTACAATT ACTGTGGCAG ATAATGTATA GACCATATTT GTAAATCTAC CTAAAAACAA 9720 

26 TAAAGGAACG AATGTTAAGT GAATTAAATA ATCTTCACGA TAAGGGATAG GGTAGACAGA 9780 

TAATAATAAT GATACGATTG TCATTAAAAC AGTGACATAA GCCTTAGAAA AAAC 9834 
<2) INFORMATION FOR SEQ ID NO: 38: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23439 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

35 

~(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
TCTCAATCAG ATGAAAAATT GCATATCGTA GGTTTTACAG AAAGTGCAAA ATATAATGCG 60 

40 

TCATCAGTCA TTTTCACGAA TGACGCTACC ATTGCCAAGA TCAATCCTAG ATTGACTGGA 120 
GATAAAATTA ATGCAGTTGT TGTACGTGAT ACAAATTGGA AAGACAAAAA ATTAAACCAA 180 

4£ GAGCTTGAAG CGGTAAGTAT TAATGACTTT ATTGAAAATT TACCAGGTTA TAAACCACAG 24 0 

AACTTAACAT TAAACTTTAT GATTTCATTC TTATTTGTCA TTTCAGCTAC AGTTATAGGC 300 
ATTTTCCTAT ATGTCATGAC ATTACAAAAG ACGAGTTTAT TTGGCATATT AAAAGCTCAA 360 

BO GGATTTACGA ATGG CTATTT GGCGAATGTG GTAATTTCGC AGACGGTCAT ATTAGCACTA 420 
TTTGGTACGG CATTTGGCTT ACTGTTAACA GGCGTTACAG GTGCATTTTT ACCTGATGCA 480 
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TV^IViM' A'I M 1' A^I 


GAAGTTTATT 


CTCCATTTTA 


ACAATTAGAA 


AAATAGATCC 


CvX xAAAGGCG 


600 




J\ X luuulnuu 


AGGTGTAGCA AATGTTGAAA 


1 X 1GAAAATG 


TAACAAAGTC 


ATTTAAAGAT 


660 


5 


ftfltZ & A Trfl T A 


ACATTGAAGC 


GGTTAAAGAT 


ACAAATTTTG 


AG AT AAA AAA 


AGGTGATATT 


720 




XoO 


TTGGACCTTC 


TGGCTCTGGT 


AAAAGTAGAT 


TTCTAACTAT 


GGCAGGTGCT 


780 


10 


HACAAACAC 


CGACATCTGG 


GCACATTTTA 


ATCAATAACC 


AAGATATTAC 


GACAATGAAG 


840 


CAAAAAGCAT 


TGGCAAAAGT 


TAGAATGTCT 


GAAATAGGTT 


TTATTTTACA 


AGCTACAAAC 


900 




CTTGTACCAT 


TTTTAACGGT 


AAAGCAACAA 


TTTACATTAT 


TGAAAAAGAA 


AAATAAGAAT 


960 


15 


GTTATGTCTA 


ATGAAGACTA 


TCAGCAACTT 


ATGTCACAAT 


TAGGTCTAAC 


TTCATTGCTT 


1020 


AATAAGTTAC 


CTTCAGAAAT 


TTCAGGTGGT 


CAGAAACAAC GTGTGGCGAT AgCaAAGCGT 


1080 




TATATACGAA 


TCCGTCGATT 


ATTTTA(VK3fl 

X X X X ^IVJX^VTV? 


ATGAACCTAC 


CGCGGCGTTA 


GATACTGAAA 


1140 


20 


ATGCGATTGA 


ACZTC BTTIIAA 
X X X nnn 


A*!"!** "I'RffyfY! 


ATCAAGCCAA 


ACAAAGAAAG 


AAAGCATGTA 


1200 




TTATTGTTAC 


& P ATY3 ATY2 A A 


ORAfTTAAArt 


CATATTGTGA 


TCGTTCATAT 


CATATGAAAG 


1260 




ATGGCGTCCT 




A ATYj A & A f* Af2 
Art X swvin Uiv 


TAGAATAGTT 


TTATTAAGCC 


GGTACATCAT 


1320 


25 


GTGCCGGTAT 


111 1/ilul X X 


.riXvXnX ini X 


TGAATAAACT 


TTCACATTCA 


ATTAATAATA 


1380 




ATTATTATCG 


A A A ATT*AfZA & 


ATA TTTT^flTV; 


AAATATAATA 


TTTTTTGTAG 


TAAAATGGCC 


1440 




TCTAAGTATT 


*_rv\l>\X X inn 


X n X wOVj%jx\ X 


TGAATATAAA 


ATTATCGTAA 


TGGGGGTCAA 


1500 


30 


TGGTTATGGA 


TTTA'PTVI ATA 


vyvj X nlw XXX n X 


TTTTATTTTT 


GGT CTTAGTG 


ATTTTT A CAT 


1560 




TATTTACATA 


TAAAGCG CCT 


AATGGTATGC 


GTGCCATGGG 


AGCATTAGCT 


AATGCAG CAA 


1620 


35 


TCGCAACATT 


TTTAGTG G AA 


GCATTTAATA 


AATATGTTGG 


TGGCGAAGTA 


TTCGGTATTA 


1680 


AATTTTTAGA 


AGAGCTAGGA 


GACGCTGCGG 


GAGGTCTAGG 


TGGTGTCGCT 


GCCGCTGGAT 


1740 




TAAC&GCATT 


AGCTATCGGT 


GTGTCAC CAG 


TATATGCATT 


AGTTATAGCA 


GCCGCGTGCG 


1800 


40 


GTGGTATGGA 


TTT ATT A C CA 


GGTTTCTTTG 


CGGGTTATAT 


GATTGGATAT 


GTGATGAAAT 


1860 


ATACAGAGAA 


ATATGTGCCG 


GATGGTGTCG 


ACTTAATTGG 


ATCGATTGTC 


ATCTTAGCGC 


1920 




CATTAGCTCG 


TCTTATTGCA 


GTATTATTAA 


CGC CAGTAGT 


GAATAGTACA 


TTGATTCGAA 


1980 


45 


TTGGTGATAT 


TATCCAAAGT 


AGTACGAATA 


CGAATCCAAT 


TATCATGGGT 


ATCATTTTAG 


2040 




GTGGTATTAT 


TACGGTTGTC 


GGCACAGCGC 


CATTGAGTTC 


AATGGCATTG 


ACAGCATTAT 


2100 




TAGGTTTAAC 


GGGTGTACCT 


ATGGCTATTG 


GTGCCATGGC 


AGCATTTAGT 


TCGGCATTTA 


2160 


SO 


TGAATGGGAC 


GCTATTCCAT 


CGCTTAAAAT 


TAGGTGATCG 


TAAGTCTACG ATTGCAGTAA 


2220 




GTATTGAACC 


TTTATCACAA 


GCAGATATTG 


TATCAGCCAA 


TCCAATTCCA 


ATCTATATTA 


2280 
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ATGCGACAGG 


TACAGCTACA 


CCGATTGCAG 


GATTTTTAGT 


TATGTTTGGA 


TTTAATCATC 


2400 




CGACGACAAT 


TGTGATTTAT 


GGTGTAGTAA 


TGGCGATTGT 


AGGTGCGCTT 


GCAGGTTATC 


2460 


5 


TTGGTTCAAT 


TGTATTTAAA AAATATCCAA 


TTGTTACTAA 


GCAAGACATG 


ATTAATCGAG 


2520 




GTGCAGTAGA 


CG CAT AG CAT 


CATCATATTG 


AATAGTAAAA 


ACAAATAAAA 


CATAGTAACG 


2580 




TGATTCAGTC 


GATf5TAAPAf3 


X t»VU\ X X\»X\ 


GTCACGTTTT 


TTTATAGAAA 


AATACAAGAC 


2640 


10 


ATAAAAATGT 






TATCATACTG 


TATAAACATT 


TATCATTTTC 


2700 




TCAAGTACCT 






TACTTTTTAC 


GAAATTATGC 


GTATTTTATA 


2760 


15 


AACAAATATC 


ATTGATATAA 


CGGTAAATGT 


AAGCGTTTAC 


AACAGAAATA 


ACAGCATGCT 


2820 


ACGATATTTT 


TGTAAATTCA 


CTGATTCAAG 


TATTTTAAGT 


CAATATGAGG 


AGGGATGTTA 


2880 




TGAGCGATTC 


TGAGAAAGAA 


ATTTTAAAAA 


GAATTAAAGA 


TAATCCGTTT 


ATTTCACAAC 


2940 


20 


GTGAACTTGC 


TGAGGCAATT 


GGATTATCTA 


GACCCAGCGT 


AGCAAACATT 


ATTTCAGGAT 


3000 




TAATACAAAA 


GGAATATGTT 


ATGGGAAAGG 


CATATGTTTT 


AAATGAAGAT 


TATCCTATTG 


3060 




TTTGTATTGG 


CGCAGCGAAT 


GTAGATCGTA 


AGTTTTATGT 


GCATAAAAAT 


TTAGTTGCAG 


3120 


25 


AAACATCAAA 


TCCTGTAACG 


TCAACACGCT 


CTATTGGTGG 


CGTAgCAAGA AATATTGCTG 


3180 




AGAACTTAGG 


TAGGCTTGGC 


GAAACGGTCG 


CT1TTTTATC 


TGCTAGTGGA 


CAAGATAGTG 


3240 




AATGGGAAAT 


GATTAAACGA 


TTGTCCACAC 


CATTTATGAA 


TTTGGATCAT 


GTTCAACAAT 


3300 


30 


TTGAAAATGC 


GAGTACAGGT 


TCATATACAG 


CTTTAATTAG 


TAAAGAAGGC 


GACATGACAT 


3360 




ATGGCTTaGC 


AGATATGGAA 


GTGTTTGACT 


ACATTACGCC 


TGAATTTTTA 


ATTAAG CGTT 


3420 




CACACTTATT 


GAAAAAGGCT 


AAGTGCATTA 


TTGTAGATTT 


GAATTTAGGC 


AAAGAGGCAT 


3480 


35 


TAAACTTCTT 






ATCAAATCAA 


ATTAGTTATC 


ACCACGGTTT 


3540 




CTTCCCCAAA 


AAITjAAAAAl 




CATTACATGC 


TATTGATTGG 


ATTATCACGA 


3600 




ATAAAGATGA 


AACAGAAACA 


TACTTAAATT 


TAAAAATAGA 


ATCTACTGAT 


GATTTAAAAA 


3660 


40 


TAG CTGCTAA 


ACG CTGGAAT 


GATTTAGG TG 


TTAAAAATGT 


TATTGTGACA 


AATGGCGTGA 


3720 




AAGAACTCAT 


TTATCGAAGT 


GGTGAGGAAG 


AAATCATTAA 


GTCAGTTATG 


CCATCAAATA 


3780 


45 


GTGTGAAAGA 


TGTTACAGGT 


GCAGGCGATT 


CATTCTGTGC 


TG CAGTAGTG 


TATAGCTGGT 


3840 


TAAATGGGAT 


GTCTACTGAA 


GATATATTAA 


TTGCTGGTAT 


GGTTAACGCA 


AAGAAAACGA 


3900 




TAGAAACGAA 


ATATACAGTT 


AGGCAAAACC 


TAGATCAACA 


GCAACTTTAT 


CACGATATGG 


3960 


50 


AGGATTATAA 


AAATGGCAAA 


TTTACAAAAG 


TATATTGAGT 


ATTCTCGAGA 


AGTTCAGCAA 


4020 




GCACGGGAGA 


ACAATCAACC 


GATTGT AG CA 


TTAGAATCAA 


CAATTATTTC 


GCATGGTATG 


4080 
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GCCATTCCAG 


i CAACCATAGC 


. CATTATAGAT 


1 GGCAAAATTA 


. AAATTGGTTT 


1 AGAAAGCGAA 


4200 




GATTTAQAAA TACTGGCAAC 


1 TAGTAAAGAC 


GTTGCTAAAG 


TATCTAGAAG 


GGATTTAGCA 


4260 


5 


GAAGTTATTG 


CGATGAAGTG 


TGTTGGTGCT 


ACTACTGTAG 


CGACGACGAT 


GATATGTGCT 


4320 




GCAATGGCTG 


Wirt* 4 WV\I A 


TTTTGTTACA 


GGAGGTATTG 


GGGGCGTCCA 


TAAAGGTGCA 


4380 


10 


GAAPATACGA 


Tfin a r* 21 tttp 

luuMUn i. 1 1 1~ 


AGCAGACTTA 


GAAGAACTGT 


CTAAAACAAA 


TGTCACTGTT 


4440 






AATTTTAGAC 


TTACCTAAGA 


CGATGGAGTA 


TTTAGAAACA 


4500 






CAGTTATTGG 


ATATCAAACG 


AATGAATTGC 


CAGCATTCTT 


CACTCGCGAA 


4560 


15 


AG CGGTGTTA 


AGTTAACAAG 


TTCGGTTGAA 


ACGCCAGAAC 


GACTTGCTGA 


CATTCATTTA 


4620 




ACAAAACAGC 


AGTTAAATCT 


TGAAGGTGGC 


ATTGTTGTTG 


CTAATCCAAT 


TCCATATGAG 


4680 




CATGCCTTAT 


CAAAAGCATA 


TATTGAGGCA 


ATCATAAATG 


AAGCTQTTQT 


TGAAGCGGAA 


4740 


20 


AATCAAGGTA 


TTAAAGGTAA 


GGACGCCACA 


CCGTTCTTGT 


TAGGGAAAAT 


TGTAGAAAAA 


4800 




ACGAATGGTA 


AAAGTTTAGC 


AGCAAATATA 


AAACTTGTTG 


AAAACAATGC 


GGCGTTGGGT 


4860 




GCTAAAATTG 


CTGTCGCTGT 


TAATAAATTA 


TTGTAGGTGA 


TGATACATGA 


ATATTTTATT 


4920 


25 


CGCTATCACA 


GGGATAGCAT 


TTGCACTATT 


TGTTGCGTTT 


TTATTCAGTT 


TTGATCGTAA 


4980 




AAAAATAGAC 


TTCAAAAAGA 


CGTTAATAAT 


GATATTTATT 


CAAGTGTTGA 


TCGTGTTATT 


5040 




TATGATGAAC 


ACAACGATTG 


GTTTGACAAT 


TTTAACTGCA 


CTAGGTTCAT 


TTTTTGAAGG 


5100 


30 


GCTAATAAAT 


ATTAGTAAAG 


CAGGCATAAA 


TTTTGTTTTT 


GGAGATATAC 


AAAATAAAAA 


5160 




TGGCTTTACG 


TTCTTTTTAA 


ACGTATTACT 


GCCATTAGTT 


TTTATTTCTG 


TATTAATAGG 


5220 


35 


CATCTTTAAT 


TATATTAAGG 


TATTACCATT 


TATTATCAAA 


TATGTAGGTA 


TCGCTATTAA 


5280 


TAAAATAACT 


AGAATGGGGC 


GCTTAGAAAG 


TTATTTTGCT 


ATTTCAACAG 


CAATGTTTGG 


5340 






IjIAIATTTAA 


CAATAAAAGA 


TATTATTCCA 


AGATTATCTA 


GAGOGAAATT . 


5400 


40 


HiniALAAl 1 


*jt»<J>l(_tj 1 (- lXs 


GTATGAGTGC 


TGTTAGTATG 


GCAATGCTAG 


GTTCATATAT 


5460 




GCAGATGATT 


GAACCCAAGT 


TCGTAGTTAC 


AGCAGTAATG 


TTAAATATTT 


TTAGTGCGCT 


5520 




TATCATCGCC 


AGTGTAATCA 


ATCCCTATAA 


ATCTGATGAT 


ACTGATGTTG 


AAATTGATAA 


5580 


45 


CTTAACGAAA 


TCCACAGAAA 


CTAAAACATT 


GAATGGAAAA 


ACAGGAAAAC 


CTAAGAAAGT 


5640 




TGCCTTTTTC 


CAAATGATTG 


GTGATAGTGC 


GATGGATGGG 


TTTAAAATCG 


CTGTTGTAGT 


5700 




AGCCGTAATG 


TTGTTAGCAT 


TTATTTCATT 


AATGGAAGCA 


ATTAATATCA 


TGTTTGGTAG 


5760 


50 


TGTTGGTTTG 


AACTTTAAAC 


AGCTTATTGG 


CTATGTGTTT 


GCACCAATCG 


CATTCTTAAT 


5820 




GGGGATTCCA 


TGGAGCGAAC 


TGTTCCAGCT 


GGCTCTTTAA 


TGGCGACTAA 


ATTAATTACA 


5880 
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CAAGGTATCA TTTCAGTTTA CTTAGTAAGc 

GTAGGTTCAA TTAAAGGCAT TAGTGATAAA 

5 

AGGTTGCTAC TTGGTTCAAC TCTAGCTTCA 

TTGTAAATGA ATCGAAGTAC CTAAATTAAA 

TTGGCGCAAC AGCGcATg C A TAACTTAGTG 

10 

CGTAGCCGTT TTTGAAATGT ATGTTGATGG 
TATATCTTTT TTATGTTTTG AAGGGACAAT 

JS AACTGTAAAT GAAATGGTAG ATACATAGAA 

GACATAAGCA AAGATGATAC CCAATATTAA 
TTTAGTGAAA ATACCAACTG CAAATACACC 

20 AAACAAGAAT AAATCCCATA AGTCATTTGA 

AAAACCGAAA ATACCTGCAA TGATAATAAT 
GCTACCTTTT CCGAAGAAGC GTTGCTTAAT 

25 TAAACTAGAT GAAATGGTAG ACTGTGCAGC 

TACAAATGGT GGCATCTCAG TCAAAATGAA 
TTTTGGTAAA ACAGCTTCAT G TG T AT AAAA 

30 

TAAGGGTGCT GAAATTAAAG CTAGGATACC 
T AAACT AT CA GAAGCTTGAT AACGCTGCAC 
GTTGTTGAAA ATATTTCCTA GGAAAATAAT 

35 

CCAATTGTCT G CACT AATT A ATTTTTTGTG 
ACCG&CTTTA ATGTTCACAA CACCTAGAAT 
GACGCCTTGA ATGAAATCAC TCCAAACCAC 

40 

AATACATAGT AAACCAACGA GTGATGCAAC 
GATTGCTAAT GTTGGTAAGT AGATAACAAT 

45 TAATAATGAG CCAATGACAC GTATGCTAGG 

AG ATGTT AC C TTTAACTTTT TAAAGAAAGG 
TGCGACGATA GCAATGTTAC CAGCGATATA 

50 TGTCGACATA AATGTAATCG CACTTAACGT 

AGATGGCAAG CGACCACTTG CGGTAAAGAA 

55 



TTCGCTAATT TTGGTACGGT TGGTATCATC 6000 

CAAGGAGAAA AAGTTGCATC CTTTGCAATG 6060 

ATCATTTCAG GATCAATCAT TGG CTTAGTA 6120 

TTCATGGCAA AGCTAAACCC CGTCACCAAG 6180 

ACGGGGTTTT ATCATAACAA TCTACTTTTT 624 0 

TTTATCTTTT TCAAAAATTG TTAATCCCGT 6300 

GAAGCTAAGT ATATAAGCAA AGACAAAAGC 6360 

AGGTGAGTTA CCTTTGCCAA CACCATTATA 6420 

TCCACAAATA ACACCGAATG TATTCGTACG 6480 

AGCCAATGGA ACGCCGAATA ATCCAGTCAC 6 540 

ATTAGAAGCA ATTAAGTATA GTGACATTCC 6600 

GAAACGTGCA AAGTTAACTT OGTGTCGCTC 6660 

GTCGATTGAA ATACAAGCAG ATATAGAATT 6720 

GGCGAAAATG GCTGCAATAA GTAATCCTGC 6780 

ATATGG CACT ACAGATGATG TATTGAAGCC 6 840 

TGAATACAGC ATTGTACCCA TACCATAAAA 6900 

ATTTGTCCAT AACGATTTAT TTGTTTCTTT 6960 

GACGTCTTGA CTCGCTGTGT ATTGATACAA 7020 

TGGAATGGCA GCTGCCGCAG TATTTAGTTT 7080 

CTCAATCGCA TCTGCAAAGA CAGTGCCGAA 7140 

AATAATAACT AAAGCGCCGC CTAATAAAAT 7200 

ACCTTCGAAA CCACCTAAAA ATGTATATAA 726 0 

GATATAAGGG TTCATGTCTG ATACAGATGT 7320 

TGCAACACGC CCTAAATGGT AAACGACAAA 7380 

GCCAAATCTA GCTTCTAAAT ATTCATATGC 744 0 

GACATAGAAA TAAATAAGTA ATGGAATAAT 7500 

TGACCAATCT GTTAAAAATG CTTT CTCTGG 7560 

AGTAGCATAA ATTGAAAAGC CAACTACCCA 7620 

ACTATTGGTA CTTTGGCTCG CGCG CTTGGT 7680 
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TGTGCCAAAT CCAACTTCTT TCATGGGCAA CATCCCCTTT ACAATGTATT GATTCTTTGA 7800 

TGTCTATAAA TCGTATTTTG CAATGAGTTG ATCTAATGTT TGTCGATGTG CTTCGTTAAA 7860 

5 AGGTTTGAAA GQTCTTTTCG GTAATCCTGC ATCAATGCCA CGATGACGTA ATATTTCTTT 7920 

CAATGTTGGA TAAATCCCCA TTGATAACAC TGTTTCGATA ATGTCGTTTG AATCATGTTG 7980 

CAGTTGGTAA GCTTCTTGAA TTTGACCTTG TCGTGCTAAG TCGAAGATTT TTCTTGCACG 804 0 

10 

GCGACCATTA ACGTTATATG TAGAACCAAT TGCACCATCT ACGCCAGAAA TCGTAGCTTG 8100 

AACTAACATT TCATCAAAGC CAGATAAGAT TAATTTGTCT GGGAATGCTT TTCTAATACQ 8160 

TTCGAGTAGG AAGAAGTTTG GCGCTGTATA TTTAACACCA ACAATTTTTT CATGATTAAA 8220 

IS 

TAGCTCG CTG AATTGTTCAA TAGAAATATT CACACCTGTT AAATCTGGTA TTGCATAAAT 8280 

AATCATATTG TTCTGAGTTG CTTCGATAAT ATCGAAATAG TAATCTCTAA TTTCTTCAAA 8340 

2Q AGTAAATGGA TAGTAGAATG GTGTTACGGC AGAAAGTGCA TCATAACCGA GTTCTGTGGC 8400 

ATATTTTCCA AGTTCAATGG CTTCATTTAA ATCTAACGAA CCTACTTGAG CAATCAATTT 8460 

CACTTTATCC CCAACTGCCT CTTTGGCAAC CTTGAAAACT TGCTTCTTCT GCTCTGTATT 8520 

25 TAATAAAAAG TTTTCGCCTG AGCTACCATT TACATAAAGA CCGTCTAATT CTTCAGTTTC 8580 

AATGGCATTT TGAGCAATTT GTTTAAGTCC TTGTTGATTT ACTTGACCAT TTTCATCAAA 864 0 

AGGAACGAGT AACGCTGCAT ATAAACCTTT TAAATCTTTG TTCATTATGA AGTCCCTCCA 8700 

30 AAAATCATTT GATAATATAG TTTACAGCTA TAATTGTAAA CGCTATCATA AAATGTAACA 8760 

ATATCTTTTT GAAAATTGTA GTCATATTTA TGTATAATTA ATGAAAATGT TTTTCAAAAT 882 0 

CAATAGAAAT GGAGTGAGTA AGGTGTATTA CATCGCAATC GATATTGGAG GCACTCAAAT 888 0 

35 

TAAATCGGCA GTTATTGATA AGCAATTGAA TATGTTTGAC TATCAACAAA TATCAACGCC 894 0 

GGACAACAAA AGTGAG CTT A TTACTGACAA AGTATATGAG ATTGTAACAG GATATATGAA 9000 

G CAAT AT CAG TTGATCCAAC CTGTCATAGG TATTTGATCA GCAGGCGTTG TTGATGAACA 906 0 

40 

AAAAGGCGAA ATTGTATACG CAGGGCCAAC CATTCCGAAT TATAAAGGTA CTAATTTTAA 912 0 

GCGATTATTA AAATCACTGT CTCCTTATGT CAAAGTAAAA AATGATGTAA ACGCTGCATT 9180 

ACTAGGCGAA TTGAAATTAC ATCAATATCA AGCAGAACGG ATCTTTTGTA TGACGCTTGG 924 0 

45 

TACAGGCATT GGGGGTGCGT ACAAGAATAA TCAAGGTCAT ATTGATAATG GTGAGCTTCA 93 00 

TAAGGCAAAT GAAGTTGGGT ATTTATTGTA TCGTCCAACT GAAAATACAA CGTTTGAGCA 9360 

SO ACGTGCTGCA ACGAGTGCAT TGAAAAAGCG CATGATTGCC GGAGGATTTA CGAGAAGCAC 94 20 

ACATGTGCCA GTATTGTTTG AAG CAG CTG A AGAAGGTGAT GATATTGCAA AACAAATATT 94 80 
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AGGGCTTATA TTAATTGGGG GCGGTATATC TGAACAAOGA GATAATCTCA TTAAATATAT 9600 

CGAGCCGAAA GTTGCACACT ATTTACCAAA AGACTATGTT TATGCACCAA TACAAACGAC 9660 

S 

TAAGAGTAAA AATGATGCAG CATTATATGG CTGTTTGCAA TGATAGTTGA AAGAAGGAGT 9720 

CATTCTAAAA TAGAATTTGA AACCGTTACG AGAGATGAGA GCTGTTGTTA GTTCCACACA 9780 

TCACACTCTA TCTAGGACCA ATCTAAACTA TATCAACCAA cAGTGTGCCA CGGGCAAATT 9840 

10 

AAATTGAAGA AGCTGAGATA TTAAAATTTT AGAAAATGTA AAAAAATATT TGGTATTGAA 9900 

ATTAAAAAAG CACCTAGCAA CTCGTTGGGA CAATCACGAT GATTGTCTAC AGTTGCAGGT 9960 

1S GGATTTGAAT ATACTACTAG TTATTTGTTG TCTAGGATAA TAGATTTAGT ATGTTGATAA 10020 

GTTTGACTCA GATTCGTATT TTCTAATAAA TGATAACTCA CGATATCGAT TAAAAAGAGT 10080 

GTCGCAATTT GTGTGTTGAT AAATTGATGG TCGGTATTAC GOGATTGATC CGTTGTTAAA 10140 

20 AGTACTAAAT CTGCACAATC TGTAAGTTTA CTACCTTCAA AATTTGTGAT GGCAACGACA 10200 

TATGCACCAT GAGATTTGGC GACTTCCGCT GCAGAAATTA ATTCCGAAGT ATTACCACTA 10260 

TTTGACATAG CAATAAACAT ATC CGAATGA GATAGTAGGG ATGCCGATAT TTTCATTAAA 10320 

25 TGTGAATCGG TAGTAACATT ACCTTTTAGC CCCATACGAA TCATACGATA ATAAAATTCA 10380 

GTCGCTGATA AACCAGAGCT ACCTAGTCCA GCAAAGAGTA TATGTCGACT TGATTGAAGT 10440 

TTGTCGATAA AGGTTTGGAT AATGTCGTTA TCAATAAATT CACCAGTTTG TTGAATGATT 10500 

30 TGTTGATGAT ATTTATGAAT TCTTTGAATA ATTGGGCTAT TTTCAATAAC TGTCTCTGTC 10560 

ATTTCTTGTT GAATATTAAA TTTTAAATCT TGGAAATTCT CATAATCCAG CTTATGACTA 10620 

AAGCGTGTCA TCGTTGCTGG TGATGTACCA ATCGCATGGG CTAAGGAGTT AATCGTTGAA 10680 

35 

AAGGCATCGC TATAACCATT TTGTCTTATA TAATTGACGA TGCGTTTATC AGTTTTTGTA 10740 

AATAAATGTT GATAACGTTG AACACGATTC TCAAATTTCA TTGTGTCACC CCTTCATCTT 10800 

AATGATTACT ATTATATATG AAAAATATTT TCAAGATAGT AAAAAGCATT GATAAAAATT 10860 

40 

ATCTTAATGA TATATTGTAA ATGACTTTAC GTGAAAAAAC GACTTATGGA GTGAGGAATA 10920 

ATGTT AC CAC ATGGATTAAT AGTATCTTGT CAGGCACTAC CAGATGAACC ATTGCATTCA 10980 

45 TCTTTTATTA TGTCGAAAAT GGCATTAGCT GCGTATGAAG GTGGTGCTGT TGGTATTCGC 11040 

GCAAATACTA AGGAAGACAT TTTAGCAATT AAAGAAACGG TAGATTTACC AGTTATTGGC 11100 

ATTGTGAAAC GTGACTATGA TCACTCAGAT GTTTTCATTA CTGCAACGTC AAAAGAAGTT 11160 

50 GATGAACTGA TAGAAAGCCA ATGTGAAGTC ATTGCATTGG ATGCAACGTT ACAGCAACGT 11220 

CCGAAAGAAA CGTTAGACGA ATTAGTATCA TATATTAGAA CACATGCACC GAACGTTGAA 112 80 
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TATATTGGCA 


CGACGTTACA 


TGGCTATACT 


AGTTATACGC 


AAGGACAATT 


ACTTTATCAA 


11400 




AATGACTTCC 


AAii-i"i"iAAA 


AGATGTACTA 


CAAAGTGTTG 


ATGCAAAAGT 


TATTGCGGAA 


11460 


5 


GGTAATGTCA 


TTACACCGGA 


TATGTATAAA 


CGTGTGATGG 


ACTTAGGCGT 


TCATTGTTCA 


11520 




GTCGTTGGTG 


GTGCGATAAC 


ACGACCAAAA 


GAAATTACGA 


AACGTTTTGT 


TCAAATTATG 


11580 


10 


GAAGATTAAA 


TGATAACGAT 


AAAAAAACGA 


GATGACCATC 


ATTAATTAAA 


GGCACCTAAT 


11640 


TATCTTAGGT 


GGCTGAATGA 


ATGTAATGGG 


TTCATCTCGT 


TTTGTTTGTT 


TATGATAGTG 


11700 




ATTTTATTTT 


CAACTTTATC 


CAAAAATAAG 


TAAAGCGACG 


GGGATGGTGA 


TTAATAGCGA 


11760 


15 


CAACGCCACG 


CGTAAAAACC 


AAATGATGAT 


GAGTTTCCAG 


ACAGGTATTT 


TAATTTCAGT 


11820 




TGCTAGTATA 


CATGGCACTA 


ATGCTGAGAA 


AAAGATAATG 


GCTGATACGC 


TTACTACACC 


11880 




GACGACAAAT 


TTAGTACTCA 


TTGCAGCTTT 


AGTTACTAAC 


AAAGATGGTA 


GAAACATCTC 


11940 


20 


TACAATAGAA 


AckCTGACGC 


TTTTGCTAGT 


AAAGCCTGAT 


CAGCAATTGG 


GAAAATATAA 


12000 




ATAAATGGAT 


AGAAGATATA 


GCCAAGCCAA 


TCAATGAATG 


GTGTATAGTT 


CGCTACAATC 


12060 




AGTCCTAAAA 


AACCAATCGA 


TAATATAGAA 


GGTAAAATAC 


CAACAGTCAT 


TTCTAAACCG 


12120 


25 


TCTTTCAAAT 


TGTCCCAAAC 


GTTCTTCACG 


AGAGATGGTG 


TTAATGCATT 


TTGTTTCATC 


12180 




GCCTCTGCAT 


ATGCAGTTTT 


CAGTCTGCTT 


CCTTCAATAG 


CAACTTCTTG 


TTCTCCTTCT 


12240 




TGTCCGTTAT 


AATATTCTGT 


TGATTCATTG 


CTGATTGGCG 


GTAGCCATGC 


AGTAATTGCA 


12300 


30 


GTCACGACAA 


ATGTGATGAC 


TAAAGTTATC 


CAAAAGTATA 


AATTC CAATG 


CGGCATTAAT 


12360 




CCTAAAGTTT 


TAGCAACGAT 


AATCATAAAA 


GTTGCTGAAA 


CTGTTGAAAA 


GCCAGTCGCA 


12420 




ATAATCGTGG 


CTTCTCGTTT 


GTTG TACATC 


CCTTGCTTAT 


AGACACGATT 


AGT AAT CAAT 


12460 


35 


AATCCTAAGG 


AATAACTGCC 


GACAAACGAA 


GCCACTGCAT 


CGACAGCGGA 


TTTTCCTGGT 


12540 




GTTETAAAAA 


TAGGTCTCAT 


AATAGGCTCC 


ATATAAACAC 


CGACAAATTC 


TAATAAGCCA 


12600 


40 


TAGCCCACTA 


ATAAAGAAAG 


CG C AATTGCA 


CCTACTGGAA 


TTAAGATACT 


TAATGGCATC 


12660 


ATTAATTTTT 


CAAACAAAAA 


CGGACCATAG 


TTAGCTTTAA 


ATAGTATTGA 


TGGACCGATT 


12720 




TTAAATACAT 


ACATTATACC 


GATCATTGCA 


CCTGCAACTT 


TAAATAATGT 


AATGACCAAG 


12780 


45 


TTTGTGATTG 


AAGTCATAAA 


AGTACGTCTC 


ACTATTGGTA 


ACGCTGTACC 


AATTAAAATC 


12840 




ATAATCAGTG 


CAACATAGGG 


CATAAGTGGA 


CCTATGATTG 


AGCGAATGGC 


TAGATGAAGA 


12900 




TGATCGACGA 


AAATAGTGTT 


GTTACCATTA 


ATCGTAAAAG 


GAATAAAGAA ACATAGTATG 


12960 


50 


CCCACTAAAC 


TATAGACAAA 


AAAACGCCAT 


GCACTTGGTT 


GTTGTGCATT 


AGAATGATAT 


13020 




TGATTCATTA 


AAGCAACCCC 


TTTGTTTAAA 


TGAATACACA 


AAACTGTATG 


ATGCATCTTC 


13080 
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ATAGTTTGAA TTATTTTCAT ACCAATACAA 
ATTACTTAAT AAAATATTTA TCTTAAATGT 
5 GTGTTTATAA ATTATTTGGA AATACACATA 

GTATTATTAA ATTTTTATTA ATTTTGTAGT 
TATTGAAGGT GCAGTTGTTT TTCATTCTCA 

10 

ATTATATGTT AAGAGGACAA GAAGAAAGAA 
TGGTGTCAGT GTTAGCGGCT ACAATGTTTG 
AAAAAACATC AACTAATGCA GCGGCACAAA 

1S 

GGAATGCGAT AACGTCACAT CAAATGCAGT 
AGAATGGTAA AAGTGGAACA GTGACAGAAG 

2Q AATCAACACA AAATAGTAAA ACAATCAGAA 

CTGAACGACA AGGTTCTAAA CAGTCACACC 
AAAATGATCA GGTTCAAAAT ACCCATCATG 

25 CACAATCGAA TGATGTTGAT AAATCACAAC 

ATCATGATAA AGCAGCACCA ACTTCAACTA 
AATCAACAAA AGCACAAGAT GCAACCACGG 

30 AACCTGCGCA TCAAATCATA GATGCAAAGC 

AACCACAAGT TGGCGATTTA AGTAAACATA 
CAGATAAAAA TACTGATaAT AAACAACTAA 

35 GTTCGACTAC AAATGCAGCA GCAGATGCTA 

TACAACCACT TAACAAATAT CCAGTTGTTT 
ATAATGCACC TGCTTTATAT CCAAATTATT 

40 

AATTGAGAAA GCAAGGCTAT AATGTACATC 
ATGATCGCGC TGTAGAACTT TATTATTACA 
CACATGCAGC TAAATACGGA CATGAGCGCT 

45 

ATTGGGAACC TGGTAAAAAG GTACATCTTG 
GTTTAATGGA AGAGTTTTTA AGAAATGGTA 
So ATGGTGGAGA AATATCACCA TTATTCACTG 

CAACATTAGC AACACCACAT AATGGTTCAC 

55 



ATTAACTAAT 


TATATATAGA 


TTGAAACTAT 


13200 


TGTTGTGTTG 


ATTCAACACC 


ACAACTAAAA 


13260 


TTTGTAAATG 


ATTAGTATCG 


ATTTAATATC 


13320 


CTTAATCmAA 


AAATAATATA 


TGTCATGTTA 


13380 


AGAGGGGGTC 


AAAAAAATAC 


TTTTGAGGTG 


13440 


AGTATAGTAT 


TAGAAAGTAT 


TCAATAGGCG 


13500 


TTGTGTCATC 


ACATGAAGCA 


CAAGCCTCGG 


13560 


AAGAAACACT 


AAATCAACCG 


GGAGAACAAG 


13620 


CAGGAAAGCA 


ATTAGACGAT 


ATGCATAAAG 


13680 


GTAAAGATAC 


GCTTCAATCA 


TCGAAGCATC 


13740 


CGCAAAATGA 


TAATCAAGTA 


AAGCAAGATT 


13800 


AAAATAATGC 


GACTAATAAT 


ACTGAACGTC 


13860 


CTGAACGTAA 


TGGATCACAA 


TCGACAACGT 


13920 


CATCCATTCC 


GGCACAAAAG 


GTAATACCCA 


13980 


CACCCCCGTC 


TAATGATAAA 


ACTGCACCTA 


14040 


ACAAACATOC 


AAATCAACAA 


GATACACATC 


14100 


AAGATGATAC 


TGTTCGCCAA 


AGTGAACAGA 


14160 


TCGATGGTCA 


AAATTCCCCA 


GAGAAACCGA 


14220 


TCAAAGATGC 


GCTTCAAGCG 


CCTAAAACAC 


14280 


AAAAGGTTCG: 


ACCACTTAAA 


GCGAATCAAG 


14340 


TTGTACATGG 


ATTTTTAGGA 


TTAGTAGGCG 


14400 


GGGGTGGAAA 


TAAATTTAAA 


GTTATCGAAG 


14460 


AAGCAAGTGT 


AAGTGCATTT 


GGTAGTAACT 


14520 


TTAAAGGTGG 


TCGCGTAGAT 


TATGGCGCAG 


14580 


ATGGTAAGAC 


TTATAAAGGA 


ATCATGCCTA 


14640 


TAGGGCATAG 


TATGGGTGGT 


CAAACAATTC 


14700 


ACAAAGAAGA 


AATTGCCTAT 


CATAAAGCGC 


14760 


GTGGTCATAA 


CAATATGGTT 


GCATCAATCA 


14820 


AAGCAGCTGA 


TAAGTTTGGA 


AATACAGAAG 


14860 
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10 



15 



20 



25 



30 



35 



40 



45 



ATTTAGGATT 


AACGCAATGG 


GGCTTTAAAC 


AATTACCAAA 


TGAGAGTTAC 


ATTGACTATA 


15000 


TAAAACGCGT 


TAGTAAAAGC 


AAAATTTGGA 


CATCAGACGA 


CAATGCTGCC 


TATGATTTAA 


15060 


CGTTAGATOG 


CTCTGCAAAA 


TTGAACAACA 


TGACAAGTAT 


GAATCCTAAT 


ATTACGTATA 


15120 


CGACTTATAC 


AGGTGTATCA 


TCTCATACTG 


GTCCATTAGG 


TTATGAAAAT 


CCTGATTTAG 


15180 


GTACATTTTT 


CTTAATGGCT 


ACAACGAGTA 


GAATTATTGG 


TCATGATGCA 


AGAGAAGAAT 


15240 


GGCGTAAAAA 


TGATGGTGTC 


GTACCAGTGA 


TTTCGTCATT 


ACATCCGTCC 


AATCAACCAT 


15300 


TTGTTAATGT 


TACGAATGAT 


GAAC CTGCCA 


CACGCAGAGG 


TATCTGGCAA 


GTTAAACCAA 


15360 


TCATACAAGG 


ATGGGATCAT 


GTCGATTTTA 


TCGGTGTGGA 


CTTCCTGGAT 


TTCAAACGTA 


15420 


AAGGTGCAGA 


ACTTGCCAAC 


TTCTATACAG 


GTATTATAAA 


TGACTTGTTG 


CGTGTTGAAG 


15480 


CGACTGAAAG 


TAAAGGAACA 


CAATTGAAAG 


CAAGTTAAAT 


TCATCTTCTG 


AATTTAATAT 


15540 


GCTATGTAAA 


TCGTGCTGTT 


ATCATGGCAC 


ATCAGATATA 


AGTAG CATCA 


CAGTGTTGAA 


15600 


TTTAAAAATA 


GTAAAGTGAA 


ATAAAGCGCC 


TGTCTCATTA 


GCGAAAACTA 


AAGGGACAGG 


15660 


CGTATCTGTT 


TATGAGCTTA 


ATAAATTGTA 


TGAATAATAT 


GGTTGATCGA 


ATAACTGTTT 


15720 


ATCATGATGA 


TAAATTGAGT 


TTTTTAAAAT 


AATGATATAT 


T ACAT CATTG 


TTATAGCGTT 


15780 


TAAGAAATCA 


ACAACTTTAC 


GATAAATAGT 


GATTGCTTCG 


TCATTAGGTC 


TACGATCAAA 


15640 


ATCATGCTCG 


TTTTTATTCA 


CGCGTTCAAA 


TGTTGAATGT 


GGAACATGAT 


TCATGATATG 


15900 


TTCGCTTTCC 


TCAACGGGAA 


CATCATAATC 


GCCATTACAA 


TGCGCAATGA 


AAACAGGTGG 


15960 


AAGTGTTTTA 


AG TT CAT CTG 


GTGCAATATT 


ATATTTTGAA 


TTAGTATAAT 


CAGCAATGTT 


16020 


AATCATATTT 


ATCCATTTAC 


CTGTGCCACG 


TGCATAAACG 


TAGATTAAAA 


AACGTTGTGC 


16080 


GATTTGATCT 


TGAACAACCG 


GTGTTGGTGA 


AGTGAGTTGT 


GCAATCATTG 


TTTCGTTTAC 


16140 


GCTTTGAGCT 


ATTTTTGCGT 


AATAACTATT 


AGTTGTTTTA 


AAAGGTTCAG 


TGTTGATGCG 


16200 


ACTATAACCA 


TAAAAATCAA 


TAACACCATC 


AATATCTCTG 


TCTCGTGCAA 


TTAATAGACT 


16260 


TAAATATGCA 


CCTGATGATC 


TGCCAAAGGT 


AAAAATAGGG 


CAATTAGAAT 


ATTGTGATTG 


16320 


AATCGCATCG 


AATGAtGCgn 


AGnACATCCT 


CAATAATGCA 


ATCGAGACTT 


ACTTCTGGTA 


16380 


ATAAACGATA 


ACTTAGTTGA 


ATTAAATCGT 


AATGTTCCGT 


AAgATATCGA 


TATACTGTGG 


16440 


GGATAAATCG 


TT AG CTTT AC 


CGAACATTAA 


TCCACCACCG 


TGGATGTAGA 


CAATAGCGCC 


16500 


TTTTGTTGGT 


TGArrn-iTO 


CTTTAATAAT 


TGTGTAAGGT 


AATGCAAATG 


CATCTTTAGT 


16560 


AATTACTTTA 


TCTTTAATTT 


CAGTCACGAT 


TTAATAGGCT 


CCTTATTnT 


GATATTGATG 


16620 


TCATTATAAC 


ACTGTCTTAA 


ATTTCCATGA 


AAAATAGTCT 


TAAGACGATG 


AGTCATGATA 


16660 
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CATCATTTTA 


ACAATATCTT 


TAAAAGCAGC ATGTGGAATG 


GCTAAATCTT 


CTAAATCTGC 


16800 




CATAGAAAAT 


TCAAGATTGA 


TATCATGTGG 


TCGCTGTTCA 


GCAAGTTTAT 


GCACAAAGTC 


16860 


5 


AGGTTCTGTG 


ACAAAAGGCG 


AAGACATGCC 


GACCATATCT 


GCATGTTGTA AAGCATCTAA 


16920 




AGCAGACTCT 


GGAGAATTAA 


TCCCGCCACT 


TGCAATTAAA 


GGGATACGAC 


CTGCTAAATG 


169B0 


10 


TTCATAGACA 


ATTTGGTTAA 


CTGGTCGACC 


GAAATGATCA 


CCTGGTGTAC 


GAGACGTATT 


17040 


TTGATAAATA 


TGTCGACCCC 


AGCTAGOGAT 


TGCTAAGTAT 


TGGATGTTTG 


AAACGTCCAT 


17100 




GACCCAATTG 


ATTAATTGGT 


TGAACTCGTC 


AATGGTATAT 


CCTAAATCAC 


TGCCTCTGGT 


17160 


15 


TTCTTCTGGC 


GTTGCTCGAA 


ATCCTAAAAT 


AAAATTGTCA 


GGTGCTTCTT 


TATCAATCAC 


17220 




TTCTTGTACC 


GCACGCATAA 


CTTCTAAACA 


TAATCTTGCA 


cgatttttta atgagtcggc 


17280 




ACCGTAATGG 


TCTGTACGTT 


TATTCGAAAA 


AGTTGAGAAA 


AATGTTTGAA 


TCAGCAAACG 


17340 


20 


TTGTGCAATC 


GAAATTTCCA 


CACCATCAAA 


ACCTGCTTTA 


ATCGCGCGTA 


ATGTAGCATC 


17400 




GCGATACTGC 


TGAATGATGC 


TATTGATTTT 


CTCATGAGAC 


ATGGCGATAA 


CATCGTGTTC 


17460 




AATCGGTGAA TGCAATGTCA 


TAGGGCTTGG 


TCCATACACC 


TTTCCAAAAT 


TTAAAATGGC 


17520 


25 


TTGATTTGAA 


AAACGACCAG 


CATGCGCTAg 


CTGGATAATA 


GCGAGGCTAC 


CATGTTGTTT 


17S80 




CATCGTAGAT 


GCCATGTTAG 


TTAATCCAGG GATACAAGCA 


TCATGATCAA 


TATTAAAGCC 


17640 




AT ATT CAAAC 


AATTGACCAT 


AAGGTTCAAT 


GTAAGCAGCG 


CCGGTGACTT 


GCATTCCAGC 


17700 


30 


TGAATTAGAG 


CGACGTGCAG 


CATAAGCCAA 


GTCTTCTTTT 


GTAATATAGC 


CTTCTTTTGT 


17760 




TGATGTGTTT 


ACGGTCATTG 


GTGATAATAC 


AAAGCGATTC 


GAAATTTTGA 


TGCCATTAGG 


17820 


35 


TAAGTGGATT 


GATTGTAAAA 


GTGGTTTGTA 


TCGGTACATA 


CTATGATTCC 


TTTTCTATTC 


17880 


AATATTGTTT 


TCAAAGTACC 


ATGGAAAGAA 


TGAATAATCA 


ATGATGAACA 


- GTCTTGATAG 


17940 




AATAGAATTG 


GTACATGGAA 


AGTATTTTTA 


AAATTAAACT 


AATGAATGGC 


ATTTGTAGGT 


18000 


40 


CTGAAAATAT 


GAATATGAAA 


AAGAAAAATA 


AAGGCGAAAA 


GATATAAAAG 


TTAATTGAAA 


18060 


AACGTTATCA 


TATACGTGGG 


TATATGAAGA 


GGGAATGGTA 


TTAAGAACGC 


TAAAATGTTA 


18120 




TGTCGGTTTG 


ACATGACAGG 


ATAAGTTTGG 


AGATGACGGA 


TTGGTTAAAT 


TAAGCGTATT 


18180 


45 


AGACTATGCC 


TTAATAGATG 


AAGGTAAGGA 


TGCACAAAAG 


GCATTGCAAG 


ATTCAGTGAC 


18240 




ACTTGCAAAA 


TTAGCAGATC 


GACTTGGCTT 


TAAGCGAATT 


TGGTTTACGG 


AACATCATAA 


18300 




TGTACCAGCG 


TTTGCGTGTA 


GTAGTCCAGA ACTTTTGATG 


ATG CAT ACAT 


TGGCGCAGAC 


18360 


50 


AAATCACATA 


CGAGTTGGCT 


CTGGTGGTGT 


GATGCTGCCG 


CACTATCGAC 


CTTATAAAAT 


18420 




TGCTGAGCAT 


TTTAGAATGA 


TGGCAGCGTT 


ATATCCAAAT 


CGTATTGATT 


TAGGTATTGG 


18480 
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10 



1S 



20 



25 



35 



45 



50 



TAGTTACGAT GAATCGATTT CGTTATTACG TGATTATCTT ACAATAAAGG ATAAACCAAG 18600 

TGCGCATACG TTAGGTGTCC AACCACACAT TGATCATTTT CCAGAAATGT GGTTATTAAG 18660 

TAGTAGCGCA ACATCTGCCA AAATAGCTGC CGAACTAGGT ATAGGGCTTT CTGTTGGAAC 18720 

ATTTTTGCTA CCAGATATAA ATGCGATACA TACAGCGAAG GATAACATTG ATATTTACAA 18780 

AAAACATTTC CAAGCATCAA CGATTAAAAT GGACGCAAAG GTGATGGCAT CTGTATTTGT 18840 

CATTGTAGCT GATAACGAAG CGGAAGTAGC AGCATTACAA CATGCCTTAG ATGTTTGGTT 18900 

ATTAGGTAAA TTACAATTTG CAGAATTTGA AGATTTTCCT TCAGTAGACA CAGCACAAAA 18960 

GTATAAGCTT AATGATCGAG ACAAAGAGAT GATTCAAGCA CATCAAGCAC GCATCATTGC 19020 

AGGTACACAA GAAAAGGTTA AAGCACAATT AGATGATTTC ATTG CTACGT TTGAAGTTGA 19080 

TGAGGTGTTA GTAGCACCGC TTATTCCAGG TATTGAACAG CGTTGTAAAA CATTAAAATT 19140 

ACTCGCGGAA ATTTATTTGT AGCATTTTAA ATAGAAGAGA AAGGATGAAG ATAAGATGAA 19200 

AAAGTTAGCC AATTATTTAT GGGTAGAAAA AGTAGGAGAT TTGTATGTGT TTAGTATGAC 19260 

ACCTGAATTG CAAGATGATA TTGGGACAGT AGGTTATGTT GAATTCGTAA GTCCAGATGA 19320 

AGTTAAAGTG GATGATGAAA TTGTGAGTAT CGAAGCATCG AAAACGGTCA TTGATGTGCA 19380 

AACGCCATTG TCAGGAACGA TTATTGAGCG AAATACAAAA GCGGAAGAAG AACCGACAAT 19440 

TTTAAACTCT GAAAAACCAG AAGAAAATTG GTTGTTCAAA TTGGATGATG TCGATAAAGA 19500 

AGCATTCCTA GCATTACCGG AGGCTTAAAT GGAAACGTTA AAATCAAATA AAGCGAGACT 19560 

TGAATATTTA AT CAATG AT A TGCATCGAGA GAGAAATGAC AATGACGTAT TGGTAATGCC 19620 

ATCTTCATTT GAAGATTTGT GGGAATTATA TCGAGGCTTA GCAAATGTCA GACCGGCATT 19680 

ACCTGTAAGT GATGAATATT TAGCTGTACA AGATGCTATG TTAAGTGATT TGAATCGTCA 19740 

ACATBTTACG GATTTGAAGG ATTTGAAGCC GATAAAAGGT GACAATATCT TTGTTTGGCA 19800 

AGGTGATATC ACGACGTTAA AAATCGATGC TATTGTTAAT GCTGCAAATA GTCGTTTTCT 19860 

AGGATGTATG CAAGCTAATC ATGACTG CAT TGATAATATT ATTCATACAA AAGCGGGTGT 1992 0 

TCAAGTTCGA CTTGATTGTG CAGAGATCAT TCGACAACAA GGGCGCAATG AAGGTGTAGG 19980 

TAAAGCCAAA ATAACACGTG GATATAATTT GCCAGCAAAG TATATAATTC ATACGGTTGG 20040 

TCCGCAAATA CGTCGATTGC CTGTTTCAAA GATGAATCAG GACTTGTTAG CTAAATGTTA 20100 

TCTTAGCTGT CTTAAATTGG CTGATCAACA TAGTTTAAAT CATGTCGCTT TTTGCTGTAT 20160 

ATCTACAGGT GTATTTGCTT TTCCTCAAGA TGAAGCAGCA GAAATTGCTG TTCGAACAGT 20220 

AGAAAGCTAT CTCAAAGAAA CAAATTCAAC ATTGAAAGTC GTGTT CAATG TATTTACAGA 20280 
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CAATGTCTCT 


GTTAATGGAT 


GACAAGACAA 


AGCAGGCTGA 


AGTATTGCGT 


ACTGCGATTG 


20400 




ATQAAGCAGA 


TGCGATAGTG 


ATTGGAATTG 


GTGCAGGCAT 


GTCTGCATCT 


GACGGATTTA 


20460 


5 


CATATGTAGG 


AG AG CGTTTT 


ACGGAAAATT 


TCCCAGATTT 


TATTGAAAAA 


TATCGCTTCT 


20520 




TTGATATGTT 


G CAAG CGAG T 


TTACATCCTT 


ATGGCAGTTG 


GCAAGAGTAT 


TGGGCATTTG 


20580 


10 


AGAGTCGTTT 


TATTACATTA 


AACTATTTAG 


ATCAACCTGT 


AGGTCAGTCT 


TACCTCGCTT 


20640 


TAAAATCCTT 


GGTGGAAGGT 


AAACAGTACC ACATTATAAC 


TACGAATGCA 


GATAATGCTT 


20700 




TCGATGTAGC 


TGATTATGAT 


ATGACTCATG 


TATTTCATAT 


ACAAGGGGAG 


TATATACTGC 


20760 


15 


AACAGTGTAG 


cTCAGCATTG 


TCATGCTCAA 


ACGTATCGCA 


ATGATGATTT 


AATTCGTAAA 


20B20 




ATGGTTGTTG 


CGCAACAAGA 


TATGCTTATA 


CCTTGGGAGA 


TGATTCCAAG 


ATGTCCAAAA 


20880 




TGTGATGCCC 


CAATGGAAGT 


GAATAAACGT 


AAAGCGGAAG 


TTGGGATGGT 


TGAAGATGCT 


20940 


20 


GAATTTCATG 


CGCAACTACA 


TCGTTATAAT 


GCTTTTCTAG 


AGCAACATCA 


AGATGATAAA 


21000 




GTGTTGTATT 


TGGAAATTGG 


AATTGGTTAT 


ACTACACCAC 


AATTTGTGAA 


GCATCCTTTT 


21060 




CAGCGTATGA 


CACGTAAAAA 


TGAAAATGCC 


CTTTATATGA 


CGATGAATAA 


AAAGGCATAT 


21120 


25 


CGCATTCCGA 


ATTCAATTCA 


AGAACGTACC 


ATACATTTAA 


CTGAGGATAT 


CTCAACATTG 


21180 




ATTACAGCAG 


CACTCCGGAA 


CGACAGCACA 


ACGAAAAATA 


ACAACATTGG 


AGAGACAGAA 


21240 




GATGTACTTA 


ATAGAACCGA 


TTAGAAATGG 


AGAATATATT 


ACTGATGGTG 


CGATTGCACT 


21300 


30 


CGCTATGCAA 


GTTTATGTTA 


ACCAGCATAT 


CTTTTTAGAT 


GAAGATATTT 


TATTCCCTTA 


21360 




TTATTGTGAT 


CCAAAAGTGG 


AAATTGGACG 


TTTTCAAAAT 


ACTGCTATAG 


AAGTGAATCA 


21420 


35 


AGATTATATA 


GATAAACACA 


GTATTCAAGT 


AGTTCGCCGA 


GATACTGGTG 


GTGGCGCTGT 


21480 


GTATGTTGAT 


AAAGGTGCCG 


TTAATATGTG 


TTGTATTTTA 


GAACAAGACA 


CTTCAATTTA 


21540 




tggtSatttt 


CAACGATTTT 


ATCAACCAGC 


TATAAAGGCG 


TTGCATACAT 


TAGGTGCAAC 


21600 


40 


AGATGTGGTA 


CAAAG CGGTA 


GAAATGATTT 


AACATTGAAT 


GGTAAAAAAG 


TGTCAGGCGC 


21660 


CGCAATGACA 


TTAATGAATA 


ATCGTATTTA 


TGGCGGTTAT 


TCGCTATTAC 


TTGATGTTAA 


21720 




TTATGAAGCA 


ATGGATAAAG 


TGTTAAAGCC 


TAATCGCAAA 


AAGATTGCAT 


CGAAAGGGAT 


21780 


45 


TAAATCTGTG 


CGCGCACGTG 


TTGGTCATCT 


TAGAGAAGCA 


CTGGATGAAA 


AGTATCGTGA 


21840 




TATAACCATT 


GAAGAATTTA 


AAAATTTAAT 


GGTGACGCAG 


ATTTTGGGAA 


TCGATGACAT 


21900 




TAAAGAGGCG 


AAACGATATG 


AATTAACGGA 


TGCAGATTGG 


GAAGCGATTG 


ATGAATTAGC 


21960 


60 


TGATAAAAAG 


TATAAAAATT 


GGGATTGGAA 


TTATGGCAAG 


TCACCCAAAT 


ATGAATACAA 


22020 




TCGAAGTGAA 


AGATTATCTT 


CAGGTACGGT 


AGACATAACA 


ATTTCTGTTG 


AACAAAATCG 


22080 
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AOAAOCATTA CAAGGAACAA AAATGACAAG AGAAGATTTA ACGCATCAGT TAAAGCAATT 22200 

AGACATCGTT TATTATTTTG GCAATGTTAC GGTAGAAGCA TTAGTGGATA TGATTTTAAG 22260 

5 TTAATATTGT TATTTTATGT ATGCTGAATC ATTGGAAGTG TTTGCTTGCT CTTGAAAAGG 22320 

TGACAATAGT GTTTGGTGAA GGTTGAACAT ATGAGTGGAA ATTATTGCCT TTAACTATTC 22380 

AAAGTATGAT ATATATATGG TTTTTGTTTC TAAATGATTG GGTATTTGAA AATAGATGAG 22440 

10 

TTTAATATTT TAAGGAATAT AATGATGTTT ACTTTTATAA TTCATATAGA ATATTAAGCA 22500 

ATATAAGTCT GTTGATATAT ACAAAATATA ATGACTGCTA TAATGAGTAA TCAATAGACA 22560 

CAAAGAGGAG ATTATGTGAT GAATAATAAA GTATTAGTAA CCGGTGGTAC AGGGTTTGTT 22620 

75 

GGCATGrCGAA TTATTTCACG ATTATTAGAA CAAGGTTATG ACGTACAAAC GACOATACGT 22680 

GATTTAAGTA AAGCTGATAA AGTAATTAAA ACAATGCAAG ACAATGGCAT TTCCACAGAG 22740 

20 CGATTAATGT TTGTCGAAGC GGATTTATCA CAAGATGAAC ATTGGGATGA AGCAATGAAA 22800 

GATTGCAAGT ATGTCTTGAG TGTAGCATCT CCGGTGTTTT TCOGTAAAAC AGACGATGCA 22860 

GAAGTGATGG CGAaCTGcAA TTGAAGGTAT ACAACGTATT TTAAGAGCTG CAGAACATGC 22920 

25 GGGTGTTAAA CGTGTGGTAA TGACTGCAAA CTTTGGTGCA GTTGGTTTTA GTAATAAAGA 22980 

TAAAAATTCA ATCACAAATG AAAGTCATTG GACAAATGAA GATGAACCAG GCTTATCAGT 23040 

ATATGAAAAA TCAAAATTGT TAGCTGAAAA GGCAGCGTGG GATTTTGTTG AGAATGAAAA 23100 

50 TACAACAGTA GAATTTGCCA CAATCAATCC AGTTGCAATT TTTGGGCCAT CATTAGATGC 2 3160 

ACACGTTTCA GGAAGCTTTC ATTTATTAGA AAATTTATTG AATGGTTCAA TGAAACGTGT 23220 

ACCGCAAATT CCGTTAAATG TTGTTGATGT GAGAGACGTA GCTGAACTGC ACATTTTGGC 232 80 

35 

AATGACAAAT GAACAAGCTA ATGGCAAGCG ATTTATTGCG ACGGCTGATG GACmAATTwA 23340 

tTTGTTGGGA ATTGcCAAAt TAATTAAAGA AAAGGGCCTG GAAATAG CTC CAAAAGTTCC 23400 

TACTAAAAAA TTACCCAGCT TTATTTTGAG CnAnGnGCC 23439 

40 

<2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 4522 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

CCCTTTGAGA GTATATCATC TAGTCAAATT ATGCCTGTCA TTAGAGCGAC TAGCTTTGAT 60 
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TATTATGCAG TCGATTTAGG GAAATCATAT CGTCTAATTG ACGAAAGCAT GTTAGAGGAT 180 

TTGAAGTTAA CTGAACAACA AATAAGAGAA ATGTCTCTGT TTAATGTTAG AAAATTGTCA 240 

AATTCATATA CGACTGATGA AGTAAAAGGT AATATTTTTT ATTTTATTAA CTCAAATGAC 300 

GGGTATGATG CAAGTAGGAT ACTAAATACT GCATTTTTAA ATGAAATTGA GGCACAATGT 360 

CAAGGCGAAA TGCTCGTAGC AGTGCCACAC CAAGATGTGT TAATTATTGC AGATATACGC 420 

AATAAAACAG GATATGATGT GATGGCACAT TTAACAATGG AATTTTTCAC TAAAGGTCTA 4 80 

GTTCCAATTA CATCATTATC CTTTGGATAT AAACAGGGTC ATCTTGAACC GATATTTATT 540 

TTAGGTAAAA ATAATAAACA AAAAAGAGAT CCAAACGTGA TTCAGOGTTT AGAAGCAAAT 600 

CGTCGTAAAT TTAATAAAGA TAAATAGAAA TAATTGGATA AGGAGTTTTG TCATAATGAA 660 

TTTATTTTAC AATCCTAAAT ATGTAGGAGA TGTCGCATTT TTACAAATTG AACCAGTTGA 720 

AGGTGAATTA AACTACAATA AAAAAGGTAA TGTTGTTGAA ATTACtAATG AAGGTAATGT 780 

TGTAGGTTAT AATATTTTTG AAATTTCAAA AGATATAACA ATTGAAGAAA AAGGTCATAT 840 

TAAATTAACT GATGAACTTG TAAATGTATT CCAAAAGCGT ATTTCAGAAG CTGGTTTTOA 900 

2$ TTATAAATTA AATGCTGATC TATCACCGAA ATTTGTAGTT GGCTACGTTG AAACTAAAGA 960 

CAAACATCCT GATGCAGATA AATTAAGTGT ACTAAATGTA AACGTTGGAA ATGACACATT 1020 

ACAAATTGTA TGTGGCGCGC CTAACGTTGA AGCTGGACAG AAAGTTGTTG TTGCTAAAGT 1080 

30 AGGTGCAGTG ATGCCTAGCG GTATGGTAAT TAAAGATGCT GAATTACGTG GTGTTGCCTC 114 0 

AAGCGGTATG ATTTGTTCAA TGAAAGAATT GAATTTACCT AATGCACCTG AAGAAAAAGG 1200 

TATTATGGTA TTAAATGACA GCTATGAAAT TGGACAAGCA TTtTTTGAAT AATTAAGGAA 1260 

GGTAGTGAAA ATATGAGCTG GTTTGATAAA TTATTCGGCG AAGATAATGA TTCAAATGAT 1320 

GACTTGATTC ATAGAAAGAA AAAAAGACGT CAAGAATCAC AAAATATAGA TrACGATCAT 1380 

GACTCATTAC TGCCTCAAAA TAATGATATT TATAGTCGTC CGAGGGGAAA ATTCCGTTTT 1440 

CCTATGAGCG TAGCTTATGA AAATGAAAAT GTTGAACAAT CTGCAGATAC TATTTCAGAT 1500 

GAAAAAGAAC AATACCATCG AGACTATCGC AAACAAAGCC ACGATTCTCG TTCACAAAAA 1560 

CGACATCGCC GTAGAAGAAA TCAAACAACT GAAGAACAAA ATTATAGTGA ACAACGTGGG 1620 

AATTCTAAAA TATCACAGCA AAGTATAAAA TATAAAGATC ATTCACATTA CCATACGAAT 1680 

AAGCCAGGTA CATATGTTTC TGCAATTAAT GGTATTGAGA AGGAAACGCA CAAGCCAAAA 174 0 

50 ACACATAATA TGTATTCTAA TAATACAAAT CATCGTGCTA AAGATTCAAC TCCAGATTAT 1800 

CACAAAGAAA GTTTCAAGAC TTCAGAGGTA CCGTCAGCTA TTTTTGGCAC AATGAAACCT 1860 



55 



35 



45 



353 



EP0 766 519 A2 





AAACAAAAAT 


ATGATAAATA 


TGTAGCTAAG 


ACGCAAACGT 


CTCAAAATAA 


ACAATTAGAA 


1980 




CAAGAAAAAC 


AAAATGATAG 


TGTTGTCAAA 


CAAGGAACTG 


CATCTAAATC 


ATCTGATGAA 


2040 


5 


AATGTATCAT 


CAACAACAAA 


ATCAATGCCT 


AATTATTCAA AAGTTGATAA 


TACTATCAAA 


2100 




ATTGAAAATA 


TTTATGCTTC 


ACAAATTGTT 


GAAGAAATTA 


GACGTGAACG 


AGAACGTAAA 


2160 




GTGCTTCAAA 


AGCGTCGATT 


TAAAAAAGCG 


TTGCAACAAA 


AGCGTGAAGA 


ACATAAAAAC 


2220 


10 


GAAQAGCAAG 


ATGCAATACA ACGTGCAATT GATGAAATGT ATGCTAAACA AGcGGAACgC 


2280 




TATGTTGGTG 


ATAGTTCATT 


AAATGATGAT 


AGTGACTTAA 


CAGATAATAG 


TACAGATGCT 


2340 


1$ 


AGTCAGCTTC 


ATACAAATGG 


CATAGAGAAT 


GAAACTGTAT 


CAAATGATGA 


AAATAAACAA 


2400 


GCGTCAATAC 


AAAATGAAGA 


CACTAATGAC 


ACTCATGTAG 


ATGAAAGTCC 


ATACAATTAT 


2460 




GAGGAAGTTA 


GTTTGAaTCA 


AGTATCGACA 


ACAAAACAAT 


TGTCAGATGA 


TGAAGTTACG 


2520 


20 


GTTTCGAATG 


TAACGTCTCA 


ACATCAATCA 


GCACTACAAC 


ATAACGTTGA 


AGTAAATGAT 


2580 


AAAGATGAAC 


TAAAAAATCA 


ATCCAGATTA 


ATTGCTGATT 


CAGAAGAAGA 


TGGAGCAACG 


2640 




aATAAAGAAG 


AATATTCAGk 


AAGTCAAATC 


GATGATGCAG 


AATTTTATGA 


ATTAAATGAT 


2700 


25 


ACAGAAGTAG 


ATGAGGATAC 


TACTTCAAAT 


ATCGAAGATA 


ATACCAATAG 


AAACGCGTCT 


2760 




GAAATGCATG 


TAGACGCTCC 


TAAAACGCAA 


GAGTACGCAG 


TAACTGAATC 


TCAAGTAAAT 


2820 




AATATCGATA 


AAACGGTTGA 


TAATGAAATT 


GAATTAGCAC 


CGCGTCATAA 


AAAAGATGAC 


28 8 0 


30 


CAAACAAACT 


TAAGTGTCAA 


CTCATTGAAA 


ACGAATGATG 


TGAATGATAA 


TCATGTTGTG 


2940 




GAAGATTCAA 


GCATGAATGA 


AATAGAAAAG 


AATAACGCAG 


AAATTACAGA 


AAATGTGCAA 


3000 




AACGAAGCAG 


(TCI A & tx fZ Tf5 A 


r\\—£\r\r\r\ J. i. \_ 








3060 


35 


AAGAAACAGA 


CTGAAAAGGT 


TTCAACTTTA 


AGTAAAAGAC 


CATTTAATGT 


TGTCATGACG 


3120 




CCATCTGATA 


AAAAGCGTAT 


GATGGATCGT 


AAAAAGCATT 


CAAAAGTCAA 


TGTGCCTGAA 


3180 




TTAAAGCCTG 


TACAAAGTAA 


GCAAGCTGTG 


AGTGAAAGAA 


TGCCTGCGAG 


TCAAGCCACA 


3240 


40 


CCATCATCAA 


GATCTGATTC 


ACAAGAGTCA 


AATACAAATG 


CATATAAAAC 


AAATAATATG 


3300 




ACATCAAACA 


ATGTTG a G AA 


CAATCAACTT 


ATTGGTCATG 


CAGAAACAGA 


AAATGATTAT 


3360 


45 


CAAAATGCAC 


AACAATATTC 


AGAGCAGAAA 


CCTTCTGTTG 


aTTCAACTCA 


AACGGAAATA 


3420 


TTTGAAGAAA 


GTCAAGATGA 


TAATCAATTG 


GAAAATGAGC 


AAGTTGATCA 


ATCAACTTCG 


3480 




TCTTCAGTTT 


CAGAAGTAAG 


CGACATAACT 


GAAGAAAGCG 


AAGAAACAAC 


ACATCCAAAC 


3540 


SO 


AATACTAGTG 


GACAACAAGA 


TAATGATGAT 


CAACAAAAAG 


ATTTACAGTC 


ATCATTTTCA 


3600 


AATAAAAATG 


AAGATACAGC 


TAATGAAAAT 


AGACCTCGGA 


CGAACCAACA 


AGATGTTGCA 


3660 
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20 



CCAAGTGTTT CATTACTAGA AGAACCACAA GTTATTGAGT CGOACGAGGA CTGGATTACA 3780 

GATAAAAAGA AAGAACTGAA TGACGCATTA TTTTACTTTA ATGTACCTGC AGAAGTACAA 384 0 

GATGTAACTG AAGGTCCAAG TGTTACAAGA TTTGAATTAT CAGTTGAAAA AGGTGTTAAA 3900 

GTTTCAAGAA TTACGGCATT ACAAGATGAC ATTAAAATGG CATTGGCAGC GAAAGATATT 3960 

CGTATAGAAG CGCCTATTCC AGGAACTAGT CGTGTTGGTA TTGAAGTTCC GAACCAAAAT 4020 

CCAACGACAG TCAACTTACG TTCTATTATT GAATCTCCaA GTTTTAAAAA TGCTGAATCT 4080 

AAATTAACAG TTGCGATGGG GTATAGAATT AATAATGAAC CATTACTTAT GGATATTGCT 4140 

AAAACGCCAC ACGCACTAAT TGCAGGTGCA ACTGGATCAG GGAAATCAGT TTGTAT CAAT 4200 

AGTATTTTGA TGTCTTTACT ATATAAAAAT CATCCTGAGG AATTAAGATT ATTACTTATC 4260 

GATCCAAAAA TGGTTGAATT AGCTCCTTAT AATGGTTTGC CACATTTAGT TGCACCGGTA 4320 

ATTACAGATG TCAAAGCAGC TACACAGAGT TTAAAATGGG CCGTAGAAGA AATGGAACGA 4380 

CGTTATAAGT TATTTGCACA TTACCCATGT ACGTAnTATA ACAGCATTTA ACnAAAAAGC 4440 

CCCATATGAT GAAAGAATGn CAAAAATTGT CATTGTAaTT GATGAGTTGG CTGATTTAAT 4500 

2S GATGATGGTC CGCAAGAAGT TG 4S22 

<2> INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 751 base pairs 
30 {B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

TCAACTTTAC GGATACGTAT ATATTTTGCA TGACATTTAG TGCAATAATA TTCATAATTT 60 

GCCCGTTGTT GATAGCTTTC AATGCTGTTA CAAAATCTAG GCGCTCCAAC CTGTTGGCTC 120 

AATCGTTTAA AATCTTGATC TTTATGTTGA TAACCTTTAC CAGCAATATG CAAGTGATAA 180 

TGACACAATT CGTGCAGTAT AATTTTTACA ACAGCATCTT CTCCATAATG CTCATATTGT 240 

TTTGGATTAA TTTCAATATC ATGGGACTTT AAAAGATAAC GTCCGCCTGT TGTACGTAAC 3 00 

CTTTTATTAA AATATGCACA ATGTCGAAAC GTACGTCCAA ATTTTTCTTC CGAAAGATTC 360 

TCAACCATTC GCTGAAGTTT GTCATTATTC ATGTGGATCA ATCATCGTTA ATGATACTTT 4 20 

GTCTTTATTT TTGTCAATAC TGTAAATCCA AACGTCAACG ATATCACCAA CACTGACAAT 4 80 

ATCCATTGGA TTTTTTACGA ACTTCTTAGA AAGTTTCGAA ACATGGACAA GTCCATCTTG 54 0 
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TTTCATTCCT TCTTGTAAAT CTTCAATTGA TAGCACATCG GATTTAAGGA TTGGTGTTTC 660 
AAACTCGTCC CTTGGATCTC GATTAGGTGC GTTCAAOGAT TTAATAATAT CCTCTAATGT 720 

£ 

AGGTACACCG ACTTGTAATT CAATCGCCAG T 751 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 1076 base pairs 

(B) TYPE: nucleic acid 
CO STRANDEDNESS : double 
(D) TOPOLOGY: linear 

is 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 



20 



40 
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TCTCCAGCTT 


TAACTTGATC 


TGGCACTTTA 


ACAATTGTCT 


GATCCATACA 


TACGCGACCA 


60 


ATAACTTCGC 


ATTGATGACC 


ATTTACATTT 


ACAAAGCTAC 


CTTGCATTAT 


GCGTAAATGG 


120 


CCATCTGCAT 


ATCCAATAgG 


TAACAATGCT 


ATTGTAGTTG 


GGTCAGTAGC 


TGTATAAGTT 


180 




TTACAGACTC 


/\L.Q.<JV?i. J. lul 


AGCGTCTTTG 


TTTGAACTAC 


ATT AG CAATT 


240 


AATTGCACAC 


TTGGTTTAAG 


GTGTACTTTA 


ACTTTTTGCT 


GTACATACTC 


TGATGGATAA 


300 


TATCCATAAA 


GGGAAATTCC 


TGGTCTTATT 


GCATTACAGA 


ATTGGCAATC 


CATTAATAGA 


360 


GAGCCTGCTG 


AGTTCTGACA 


ATGTATATAT 


TCAGGTTTAA 


TTG CTTCATT 


GACCATATCT 


420 


TTAAAACGTT 


GATATTGTTC 


AGTTGTCATA 


TCTCCTGGTT 


CGTCAGCACA 


GGCAAAGTGT 


480 


GTAAACACGC 


CTTCAAATAC 


AAGTTGCTCA 


TATTGTTGAA 


TGATTTCAAT 


CACTTCTTGA 


540 


TACGTTTTAG 


TATCTTTAAT 


ACCTAAACGT 


CCCATTCCTG 


TATCTAATTT 


AATGTGCAAC 


600 


CATAACTTTT 


TCTCTTGCTC 


ACCAGAAATG 


TTTTTAATTG 


CTTCTTTCAA 


CCACTGTTTA 


660 


GACGGAACCG 


TTAAGGCAAC 


TCGGTGTTGT 


ATCGCTTTAT 


CAATATCTTT 


AGCTGGTAAC 


720 


ACACCTAAGA 


CTAAAATTTT 


AGCAGTAATC 


CCATGCATTC 


TAAGTTCTAT 


CGCTTCATCT 


780 


AACGTTGCTA 


CAGCAAAAAA 


TGTGGCGCCA 


TTTTCCATTA 


AATGACGTGC 


TACTTTAACA 


840 


CTACCTAGTC 


CATAGGCATT 


GGCTTTAACG 


ACAGCCATCA 


CTGTTTTATT 


TGGATGCAAT 


900 


GTACTGAATA 


CTTTGAAATT 


TGATGCAACA 


GCGTTTAAAT 


CTACATTCAT 


ATACGCAGAT 


960 


CTATAATATT 


TATCCGACAT 


ATTACTTCCT 


CCTGTAATTC 


CCACACGTTT 


TAAAACTAGA 


1020 


TCTTAATTAT 


CATTGTATAA 


CAAATTTAAA 


ATG CTGACTT 


TTCTAAAACA 


ACTTGG 


1076 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 930 base pairs 
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(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

TGACCACAAT GCCCAATACA ACCATCCCAT GGTAAAGCCA AOAOATOAGT CAATAAAGCG 60 

TGTTGAATAA GAGCTGAATG AACCTGATAC TGGATAAAAT GTTGCCAACT CTCCAATTGA 120 

TGACATTAAG AAATATAGCA TGACACCAAT AACAAGATAA GCGAGTATAG CGCCTCCAGG 180 

ACCAGCTTGA GAAATGATAT TACCAGTAGC TACAAATAGA CCAGTCCCAA TTGCACCACC 240 

TATAGCAATC ATGGAAATGT GTCTTGAGTT AAGACTACGG TTCATTTTAT TATCTTCCAT 300 

ATTTAGT CTC CCATCTATTT AAATATACCC ATTATTGTAA GCTTTTTAAG TGTACTATTC 360 

AATAACTATT TAGTACTGTA AAGCGAAAAA ATTAAAATTT TCTGATTTTT TAATCATCTT 420 

20 GAGCATGTTT AATTGTAATT TTGATGGGGT TAAATTATAA TATGTATTAA ATTATAATTA 480 

TnATAAATTG TGGAGGGaTG ACTATGTCAC AACAAGACAA AAAGTTAACT GGTGTTTTTG 540 

GGCATCCAGT ATCAG AC CGA GAAAATAGTA TGACAGCAGG GCCTAGGGGA CCTCTTTTAA 600 

25 TGCAAGATAT TTACTTTTTA GAGCAAATGT CTCAATTTGA TAGAGAAGTA ATAC CAGAAC 660 

GTCGAATGCA TGCCAAAGGT TCTGGTGCAT TTGGGACATT TACTGTAACT AAAGATATAA 720 

CAAAATATAC GAATGCTAAA AtATTCTCTG AAATAGGTAA GCAAACCGAA ATGTTTGCCC 780 

30 GTTTCTCTAC TGT AG CAGGA GAACGTGGTG CTGCTGATGC GGAcGTGACA TTCGAGGATT 840 

TGCGTTAAAG TTCTACACTG AAGAAGGGAA CTGGGaTTTA GTAGGGAATA ACACACCaGT 900 

ATTCTTCTTT AG AGATC CAA AGTTATTTGT TAGTTTAAAT CGTGCGGTGA AACGAGATCC 960 

35 

TAGAACAAAT ATGAGAGATG CACAAAATAA CTGGGATTTC TGGaCGGGTt TCCAGAAGCA 1020 

TTGCACCAAG TAACGATCTT AATGT CAGAT AGAGGGATTC CTAAAGATTT ACGTCATATG 1080 

CAT6GGTTCG GTTCTCACAC ATACTCTATG TATAATGATT CTGGTGAACG TGTTTGGGTT 1140 

40 

AAATTCCATT TTAGAACGCA ACAAGGTATT GAAAACTTAA CTGATGAAGA AGCTGCTGAA 1200 

ATT AT AG CT A CAGATCGTGA TTCATCTCAA CGCGATTTAT TCGAAGCCAT TGAAAAAGGT 1260 

GATTATCCAA AATGGACAAT GTATATTCAA GTAATGACTG AGGAACAAGC TAAAAACCAT 1320 

45 

AAAGATAATC CATTTGATTT AACAAAAGTA TGGTATCACG ATGAGTATCC TCTAATTGAA 13 80 

GTTGGAGAGT TTGAATTAAA TAGAAATCCA GATAATTACT TTATGGATGT TGAACAAGCT 1440 

So GCGTTTGCAC CAACTAATAT TATTCCAGGA TTAGATTTTT CTCCAGACAA AATGCTGCAA 1500 

GGGCGTTTAT TCTCATATGG CGATGCGCAA AGATATCGAT TAGGAGTTAA TCATTGGCAG 156 0 
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GGTCAAATGC GCGTAGTTGA CAATAACCAA GGTGGAGGAA CACATTATTA TCCAAATAAC 1680 

CATGGTAAAT TTGATTCTCA ACCTGAATAT AAAAAGCCAC CATTCCCAAC TGATGGATAC 174 0 

5 GGCTATGAAT ATAATCAACG TCAAGATGAT GATAATTATT TTGAACAACC AGGTAAATTG 1800 

TTTAGATTAC AATCAGAGGA CGCTAAAGAA AGAATTTTTA CAAATACAGC AAATGCAATG 1860 

GAAGGCGTAA CGGATGATGT T AAACG AC GT CATATTCGTC ATTGTTACAA AGCTGACCCA 1920 

10 GAATATGGTA AAGGTGTTGC AAAAGCATTA GGTATTGATA TAAATTCTAT TGATCTTGAA 1980 

ACTGAAAATG ATGAAACATA CGAAAACTTT GAAAAATAAA TTTGATATGT AGTTTCTATA 2040 

TTGCGTAGTT GAGCAGTTTA TGATATCATA ATAAATCGTA AAGATTCCTA ACAAGAGAGG 2100 

IS 

GTGTTTAACG TGCGCGTAAA CGTAACATTA GCATGCACAG AATGTGGCGA TCGTAACTAT 2160 

ATCACTACTA AAAATAAACG TAATAATCCT GAGCGTATTG AAATGAAAAA ATATTGCCCA 2220 

AGATTAAACA AATATACGTT ACAT CGTGAA ACTAAGTAAT TCTTATCATT CAAATACGAC 2280 

20 

GATTTGAAAA TAAAGCGGGC TTAC CTATT A TATTGGGGAG CTCGCTTTTT TATGAAATTT 2340 

TTGTGAAGAG TGATTAATGG ATTGAGTTTC ATCGGTAGAA CAATATATGA TTATATTAGT 2400 

TGTTACTTTA TTAAAaTTTG AGAATATTTA TAGAAGGAAA TAGATTACTG ATTTTATAAA 2460 

SS 

GTCACTTTGT TAGCGAATGC TTGAAAGAGT ATTTAATATA GTAGAATTTA AAATTTCAAA 2520 

GCGGAATTTA ATAAGTACGA AGTAGTTCTG GGTATGTTTT ATAAATGTTC GATAATACAC 2580 

TTTAATCTTA AATATGATGG TTTAGAAAAT GATTTAACAA AGAAATGAaA CTTTACTGTT 2640 

30 

GAATTATGTG AGGATTGTGT TATTATATAA ATCGTAATAA TTACGATTTG ATAAAAAGTG 2700 

AGGTAACTAT ATATGGCTAA GAAATCTAAA AT AG CAAAAG AGAGAAAAAG AGAAGAGTTA 2760 

3S GTAAATAAAT ATTACGAATT ACGTAAAGAG TTAAAAGCAA AAGGTGATTA CGAAGCGTTA 2820 

AGAAAATTAC CAAGAGATTC AT CAC CTACA CGTTTAACTA GAAGATGTAA AGTAACTGGA 2880 

AGACCTAGAG GTGTATTACG TAAATTTGAA ATGTCTCGTA TTGCGTTTAG 2930 

40 (2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3 606 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
45 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

50 

CTTCTTGCCA TGGCTCTCTT TATTTAAAAA TGCTTCCAAC TTGTCCATTT GATTGTTTCT 60 
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TTATAAAAAA CTAATTTTAC AAATGCTTTT GCGTTCTTAC AAAAAATGCA TTTGACTATT 180 

ATTATAATAA GCGTATAATT GTCGCATATT ATTTTTTGTA TTTTTGGCAA TAACGAAGGA 240 

S GTATTTATGA ATAAAGACAA GCAATTGCAC AACGACAAAA TCAATCTATC CCAATTAGTC 300 

TTATTAGGGT TAGGCTCTTT AATAGGATCT GGTTGGCTAT TTGGTGCGTG GGAAGCATCA 360 

TCAATAGCTG GACCAGCAGC AATCATATCA TGGGTTCTTG GATTCCTAGT CATTGGAACC 420 

10 ATTGCCTATA ACTACATTGA AATCGGCACA ATGTTTCCTC AATCAGGTGG CATGAGTAAC 480 

TATGCCCAGT ATACACATGG CTCATTATTA GGCTTTATTG CTGCTTGGGC GAATTGGGTG 540 

T CT TTGGTGA CAATAATACC TATCGAAGCT GTGTCAGCTG TTCAATATAT GAGTTCTTGG 600 

CCGTGGCATT GGGCGAAACC AATGAGATAT TTAATGGAAA ATGGCTCTAT TAG CACATAC 660 

GGATTGCTAG CTGTATATCT CATCATTGTT ATTTTTTCAT TATTAAACTA TTGGTCCGTA 720 

AAACTTTTAA CATCATTTAC GAGTTTAATT TCTGTATTTA AATTAGGCGT ACCCATGTTA 780 

ACCATCATCA TGTTGATGCT ATCAGGATTC GACACTTCAA ATTAOGGCCA TTCGGCAAGC 840 

ACATTTATGC CTTACGGAAG TGCACCGATT TTTGCTGCAA CAACAGCATC AGGGATTATT 900 

TTTTCATTCA ATTCATTCCA GACAATTATT AATATGGGTT CAGAAATTAA AAATCCTGAA 96 0 

AAAAATATCG CAAGAGGCAT CGCTATCTCA CTGTCAATCA GTGCAGTGTT GTACATCATT 1020 

TTACAAAGTA CGTTTATCAC TTCTATGCCT CAATCAATGT TACAACATAG TGGATGGAAT 10 80 

GGCATCAACT TCAATTCACC ATTTGCTGAT TTAGCTATCT T ATT AGGAAT TAATTGGCTC 1140 

GCAATTTTAC TATACATTGA AGCTTTTGTA TCACCATTCG GTACTGGCGT GTCATTTGTC 1200 

GCCGTTAGAG GTCGAGTTTT ACGAG CAATG GAGAAAAATG GACATATCCC TAAATTTCTT 126 0 

GGGAAGATGA ATGAAAAATA TCATATCCCA CGTGTAGCAA TCATCTTTAA TGCCATCATT 1320 

AGTJQXSATTA TGGTTACATT ATTTAGAGAT TGGGGTACGC TAGCAGCAGT TATTTCTACT 13 80 

GCAACTTTAG TAGCCTATTT AACTGGCCCA ACGACAGTGA TTGCATTAAG AAAAATGGGA 144 0 

40 CCAACAATGA CTCGTC CATT TAGAGCAAAA ATTTTAAAAG TAATGGCACC ATTATCATTT 1500 

GTATTAGCTT CATTAGCTAT ATATTGGGCA ATGTGGCCAA CAACGGCTGA AGTTATTTTA 156 0 

ATCATTATAC TTGGATT AC C AATCTACTTC TTCTATGAAT ATCGTATGAA TTGGCGTAAT 1620 

45 ACAAAGAAAC AAATTGGTGG TAG CTTATGG ATTATTGTAT ATTTAATCGT GCTATCAATA 1680 

CTGTCATTTA TAGGAAGCAA AGAATTTAAA GGCTTAAATA TGATTCACTA TCCATTTGAC 174 0 

TTTATCGTTA TTATTATTGT GGCACTTATC TTCTATTACA TCGGTACAAC GAGTTCATTT 1800 

60 GAAAGCGTCT ATTTCCGTCG CGCAACACGA ATCAATACGA AGATGCGTGA GTCACTAAAT 1860 
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CACACACATT 


AACCAACCAT 


TGATTTCAAC 


ATCTTGGTTG 


GTTTTTTATT 


TTGAAAATCG 


1980 




GTTATAAATA 


ACTAACATAA 


CAAGATGATG 


ATCAGGCTGG 


GACATAAATC 


AATGTTCTAT 


2040 


5 


GCTCTACGAA 


gTTATATTGG 


CAGTAGTTOA 


CTGAACGAAA 


ATGCGCTTGT 


AACAAGCTTT 


2100 




TTTCGATTCT AGTCAGGGGC 


CCCAACACAG 


AGAATTTCGA 


AAAGAAATTC 


TACAGGCAAT 


2160 




GCAAGTTGGG 


GTGGGACGAC 


GATAAAGAAA 


TACTTTTTCT ATAGAAATTA GTATy t CTTA 


2220 


10 


TGCATGAGTT 


TTACTCATGT 


ATTCATATTT 


TTAAGTACAC 


ATTAGCTGTG 


GCTAATGTAT 


22B0 




AAGAACCACT 


A CAT AATAAA 


TCATTTGTGG 


CTCTTTATCA 


TTTCTGTCCC 


ACTCCCGTAG 


2340 




AAGTACATCA 


TATAATGCTG 


AAAATGGTTT 


GAGTTAAAAC 


AGATATCAAG 


CTCGTCTGAT 


2400 


15 


TCAGTCACAA AATTGTCTTG 


TTATACTTGT 


CACCTATCAT 


CTATAGACCG 


TGGTATGATT 


2460 




AAATTGGGGA 


TGATAAAGGA 


GGTTAATAAA 


TATGAAGATT 


AATACTACAG 


GTGGTCAAAT 


2520 


20 


TCATGGTATT 


ACACAAGATG 


GTTTAGATAT 


CTTCTTAGGC 


ATTC CTTATG 


CAGAACCACC 


2580 


AG TTCATGAC 


AATCGCTTTA 


AACATTCTAC 


GTTAAAAACA 


CAATGGTCAG 


AGCCAATTGA 


2640 




TGCAACTGAA 


ATACAACCCA 


TCCCACCGCA 


ACCAGACAAC 


AAATTAGAAG 


ATTTTTTCTC 


2700 


25 


CTCACAATCT 


ACAACTTTTA 


CTGAACATGA 


AGACTGTTTA 


TATCTAAATA 


TTTGGAAACA 


2760 


ACATAATGAT 


CAGACGAAGA 


AACCTGTCAT 


CATTTATTTT 


TATGGTGGTA 


GTTTTGAAAA 


2820 




TGGTCATGGT 


ACAGCCGAAC 


TCTATCAACC 


GGCACATTTA 


GTACAAAATA 


ACGACATTAT 


2880 


30 


CGTTATTACA 


TGCAATTATC 


GTTTAGGCGC 


ATTAGGATAT 


TTAGACTGGT 


CATATTTTAA 


2940 




TAAAGATTTT 


CATT CCAAT A 


ATGGCCTTTC 


AGATCAAATC 


AATGTCATAA 


AATGGGTGCA 


3000 




TCAATTTATT 


GAATCCTTCG 


GTGGCGACGC 


TAATAACATT 


ACTTTAATGG 


GTCAGTCTGC 


3060 


35 


AGGGAGTATG 


AGCATTTTGA 


CTTTACTTAA 


AATAC CTGAC 


ATTGAG C CAT 


ACTTCCATAA 


3120 




AGTCGTTCTA 


CTAAGTGGCG 


CACTACGATT 


AGACACCCTT 


GAGAGTGCAC 


GCAATAAAGC 


3180 




ACAACATTTC 


CAAAAAATGA 


TGCTCGATTA 


TTTAGATACA 


GATGATGTTA 


CATCATTATC 


3240 


40 


GACAAATGAT 


ATTCTTATGC 


TGATGGCGAA 


gcTAAAACAA TCTCGAGGAC 


CTTCTAAAGG 


3300 




GCTTGATTTA 


ATATATGCGC 


CTATTAAAAC 


AGATTATATA 


CAAAATAATT 


ATCCAACAAC 


3360 




GAAACCAATT 


TTTGCATGTT 


ATACAAAAGA 


TGAAGGCGAT 


ATTTATATTA 


CTAGTGAACA 


3420 


45 


GAAAAAATTA 


TCGCCGCAAC 


GCTTTATCGA 


CATTATGGAA 


TTAAATGATA 


TTCCTTTAAA 


3480 




ATACGAAGAT 


GTTCAGACGG 


CGAAGcAACA 


ATCTTTAGCG 


ATTACACATT 


GTTATTTCaA 


3540 




ACAGCCGATG 


aAGCAATTTT 


TACmACmACT 


CAATATACmA 


GATTCCAAC C 


GCACCAACTA 


3600 


SO 


TGGCTT 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15109 base pairs 

(B) TYPE: nucleic acid 
CO STRAND ED NESS : double 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

GAAATTAAAA AAGCAATTGG nACAAGATGC AACAGTGTCA TTGTTTGATG AATTTGATAA 60 

AAAATTATAC ACTTACGGCG ATAACTGGGG TCGTGGTGGA GAAGTATTAT ATCAAGCATT 120 

TGGTTTGAAA ATGCAACsAG AACAACAAAA GTTAACTGCA AAAGCAGGTT GGGCTGAAGT 180 

GAAACAAGAA GAAATTGAAA AATATGCTGG TGATTACATT GTGAGTACAA GTGAAGGTAA 240 

ACCTACACCA GGATACGAAT CAACAAACAT GTGGaAGAAT TTGAAAGCTA CTAAAGAAGG 300 

ACATATTGTT AAAGTTGATG CTGGTACATA CTGGTACAAC GATCCTTATA CATTAGATTT 360 

CATGCGTAAA GATTTAAAAG AmAAATTAAT TAAAGCTGCA AAATAATTCA G CT AT AT AAG 420 

TTAGTGAAAT GAGAGTCTGA AACATATCAA TCTTTTGATA TTGTATTAGG CTCTTATTTT 480 

TAT AG CT AG A AAGTTAGATA TTTGTATTTT TTTAAATAAT AAGTGCCGTT GTTATCGTTC 540 

AATTTAATTA ATGATAGATT AGTATTATTA TAGCTAAAGT AGTATACCTG AGAAAATAGC 600 

TCAATGTATC TCTTTATTAA TAAGTTATAT CATAATTATT TT AGTG CAT A CTTTATGGAA 660 

GGGATATCAG GGAATGGCTT TCAATTAAAG AAGAGGTTTA AAAGGATTAC AACAGAATGT 720 

TATGATTTTG TAGAAAGATA TATAACAACG TTTTATAAAA ACATAATATT GTTAATGGAA 780 

AATGAAATGT AAGGGGGATT T CG AGTG ACT AAGAAAGTTT ATTTTAACCA CGATGGTGGT 840 

GTAGATGATT TAG TATCT CT ATTTTTATTA TTACAAATGG AAAACGTTCA ATTGATAGGG 900 

GTOSGTACAA TTGGTGCTGA TTGTTATTTA GAGCCATCTT TGAGCGCATC AGTAAAAATT 960 

ATTAATCGTT TTTCAAATGA AGATATTCAA GTTGCGCCAT CATATGAACG AGGAAAAAAT 1020 

40 CCATTTCCTA AAGAATGGCG TATGCATGCC TTTTTTATGG ACGCATTGCC AATTTTAAAT 108 0 

GAG CCAGTCA AACATGTTGC TTCAAATGTG AGCGACAAAG AAGCCTTTGA AGACATTATT 1140 

CAAACTTTAA AGAGACAATC AGAAAAAGTA ACATTATTAT TTACAGGCCC GCTTACAGAT 1200 

45 TTAGCAAAAG CACTACAAAA AGATTCATCT ATCGTTCAGT ATATAGAAAA ATTAGTTTGG 126 0 

ATGGGTGGCA CCTTTTTACC AAAAGGAAAT GTTGAAGAAC CTGAGCATGA TGGTTCTGCA 1320 

GAATGGAATG CATATTGGGA TCCAGAAGCG GTTAAAATTG TTTTTGATAG CGATATAGAG 1380 

50 ATTG AT ATGG TTGCTTTAGA AAGTACGAAT CAAGTACCGC TAACGTTAGA TGTTAGACAA 1440 
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GTACCACCAT TAACACACTT TATAACAAAT 
ACTGCTTATA TTGGTAACAA GGACTTOGTT 

5 AGTTATGGAC CAAGTCAAGG TAAGACATTT 

ATAAATCATG TAGATAACAA CGCATTTTTT 
AATTAACAGC TGTQTAGAAT AATTAAGGTT 

10 TTTTCATTTC TTAAAGTTTA CAATGGTGCT 

TAAAAAATGA CAACAAAACA GTTAGTATAT 

TTAGGATTGG TACOGGTAAT TCCACTACCA 

IS 

ATTGGTATTT TCTTAGCAGG TGCGATTTTA 

GTCTTTTTAT TATTAGTAGT TGCTGGCTTG 

GGTGTATTCG CAGGTCCTTC AGCAGGGTTT 

20 

ATTGGGGCGA TTCGAGATAG ATTCATCAAT 
ATTTTAGTTT TTGGTGTTAT AG CATT AGAT 
ATTAACATAC CATTTACGAA AG CT ATTTCA 

25 

TTAAAAGCAA TTGTAGCAAG TTTGATTGGT 
CAAATTATGG GAATAAAATA AT CAT ATTT A 
GAAATTTATA AAAGTGAAAG GAGTAGGTGT 

30 

ATTGTAACGG CACTATATTT GAAAATGACG 
TTCCACAACG ATACACAAGT AACACATGGA 
GGCTGAAAGA CTTAGAACGT CAACATCAAT 

35 

CGTTTAGTTT CCCGGAAAAT GAACAACTTG 
TGAATTTTGA ACTAGGTATT ATGGAATTGT 
TGCCGCGTAA CTCTGACGTT GAAATTGCCA 

40 

TAAAAGTTGC ATATCAGTTT AGTTTG C CAT 
AAATGGTAAG GGAACATTAT CAAAAAGATG 

4S ATGAACCTAT TGGCGTTGTA GATGTCATTG 

TTGGTGTATT AGAACAATTT CGGCACCAAG 
GTGAATACGC CATATCAAAA AATCACAAAC 

SO CAGCAAAAGA TATGTATGCA AAGCAAGGTT 
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TCTACTTACT TTTTATGGGA TGTTTTAACG X560 

CATTCAATTG AGAAAAAAGT CGATGTAATA 1620 

GAGTGTAAAG ATGGGCGCAA AATTAATGTC 16 80 

GATTATATAA CTGCACTTGC TAAAAAAGTA 1740 

TTAATTTATA TAGAACAACT TATTGTAAAC 1800 

ATAATAATGG TCATGAAATA CGAAAGGAAG 1860 

ACAGCTTTAA TGACAGCGAT TATCGCTATT 1920 

TTTTCTTCAG TACCAATTGT ACTTCAAAAC 1980 

GGACGTAAAT ATGGCACATT AAGTGTTATC 204 0 

CCATTGTTAT CAGGTGGTCG CGGTGGCATC 2100 

TTACTATTAT ATCCAGTTGT AGCATTCATG 2160 

GAAATTAATT TCTGGATTTT ATTCGTTGGT 2220 

GTTATTGGTA CATTGATTAT GGGCATGATT 2280 

ATTTCATTAG CTTATTTGCC TGGTGATATA 2340 

ACAGCTTTAC TTAATCACTC GCAGTTTCGT 2400 

AGATAGTAAA GTAATTGAAT AAGTTGCTTT 24 60 

CAATGGCTAG TATAAGTATG TCAGATATAT 2520 

ACGAG CAGTT GATTTATTTA ACGCCTTCTT 2580 

TATATAAAAA GACGCCTACC CAAGAGCGAT 2640 

TACATACAAA TCAAGGTTCA AATCATTATG 2700 

ATAATCATTG GATGGCTATG TTTAAAGATA 2760 

ATGCCATAGA AAGTGATGCG CTTGCCAATT 2820 

TCGTTGACGA GTCG CAT AT A GATGCCTATT 2880 

TTGGAAAAGA CTATGCAGAT GCACATGAAG 2940 

TGATTAAACG CTTAGTAG CT TATTTAAATA 3 000 

AAAGTGAAAA TTACATTGAA TTAGATGGAT 3 060 

GAATTGGATC TACAATTCAA TCGTTGATAG 3120 

CAATCATATT AG TTG CAGAT GGTGAAGATA 3180 

ATGTCTATCA AT CGTTTTGT TATCAAATAT 3 240 
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TAAGCTGGTT 


TCGAGTAGAA 


ATCAACTTAC 


TGCTTTTTAA 


ATTGTTTTGA 


GCTACTTATA 


3360 


CTTATAAAAA 


TAGTGCGTTT 


AAATTGTTGA 


TTCATGTAGA 


ATATCGTTCA 


TTATGACACA 


3420 


CTATAATGAA 


TATGTTATTG 


TTCAGAATCA 


ATGATACGTT 


CTGGATGACT 


GTATATATTA 


3480 


AAGCCACCAT 


TTCGAATAAA 


TCCAACTGCC 


GTAATATTTA 


GGTCATTAGC 


TAAGGTTACA 


3540 


GCAAGCGTTG 


TCGGAGCTGA 


TTTAGATAAA 


ATGACGCCAA 


CACCAATTTT 


TGCGGCTTTA 


3600 


ATTAAAATTT 


CTGATGAAAT 


ACGTCCACTA 


AAAATTAATA 


CTTTATCTCG 


GACAGTAATA 


3660 


TGTCGCTGAA 


TACAAAATCC 


ATATAATTTA 


TCTAGAGCGT 


TATGTCTACC 


AATGTCTTGT 


3720 


CGATGTACAA 


AAAATGTCAA ACCATCGCTT 


ATAGCAGCAT 


TATGTAAGCC 


ACCTGTTTCT 


3780 


TGGTAAATAT 


GACTTGCACT 


TTGTAATCGA 


GTCATCATGT 


TAATAATTTG 


CATTGGAGTT 


3840 


AAAGTGATTT 


TAGACATAGA 


TGTTTTAGCG 


ATAGCAGCAT 


CATTTTGAAA 


ATAAAACTCA 


3900 


CGACTCTTTC 


CGCAACAAGA 


TGCAATCATT 


CGTTTTGTGG 


AATATTGAAA 


GCGATCGCCT 


3960 


AAATCTTTAT 


TAAGTTCAAC 


ATGGGCAAAA 


CCTTTACTAT 


CATCAATCAG 


TACAGATTTT 


4020 


AATTCATCTC 


GCTTTAAAAT 


GGCACCTTCC 


GAAGCCAGAA 


ATCCAATGAC 


TAACTCCTCA 


4080 


AGGTTTGTTG 


GACTGCATAT 


AACAGTCGCA 


AATTCTTCAC 


CATTCACCAT 


AATTGTAAGT 


4140 


GGAAATTCTG 


TCACATATTG 


ATCTGTTGTA 


TTGAATAATT 


TTCCATCTTC 


ATATCTAACA 


4200 


ATTGGTTGAC 


CTAAAGATAC 


ATCTTTGTTC 


ATTATCTAAC 


CCCTTTAATT 


AGCTTAAACT 


4260 


TTATTTTAAA 


GCAATTTGCT 


TAAAATTTTA 


ACATATTTGC 


TTAAGTTTGA 


AATTTGATTG 


4320 


ATAAAAATTA 


ATAGCGAGCA 


ATCTGTTTGA 


TTTAAATTGA 


ATTCGAGAAT 


ATACATACTA 


4380 


GGGCATCAAT 


TAATAAATAT 


CAATCTTATG 


CAAATTTGAC 


AATTGTTTGA 


ATCAATATAT 


4440 


AAACAGGCAA 


CGGTTCTTTT 


CAAATATAAT 


AGTAAGTGTA 


TAATGAAAAT 


GTAAATATTA 


4500 


TTAAAAATGG 


GGGTTCACTC 


AATGAAATTG 


AAACGTTTAT 


TTGCTGTTGT 


GATTGCAATG 


4560 


CTTTTAGTAT 


TAGCTGGTTG 


CTCTAATTCT 


AACGATAATA 


ATGAAAGTAA 


AAAAGATGAC 


4620 


GCAGACAATG 


GTAAGAAACA 


AGAGATTCAA 


GTTGCAGCGG 


CAGCAAGTTT 


AACAGATGTA 


4680 


ACCAAGAAAT 


TAGCTTCAGA 


ATTTAAAAAA 


GAGCATAAAA 


ATGCTGATAT 


TAAATTTAAC 


4740 


TATGGTGGAT 


CAGGGGCATT 


AAGAAAACAA 


ATTGAATCAG 


GCGCACCTGT 


TGACGTATTT 


4800 


ATGTCTGCAA 


ATACTAAAGA 


TGTAGATGCA 


TTAAAAGACA 


AGAATAAAGC 


GCATGATACA 


4860 


TATAAATATG 


CGAAAAATAG 


TCTAGTATTA 


ATTGGTGATA 


AAGATTCAAA 


TTACACTTCA 


4920 


GTAAAAGACT 


TAAAAGACAA 


TGATAAATTA 


GCATTAGGTG 


AAGTGAAAAC 


TGTACCAGCA * 


4980 


GGAAAATATG 


CGAAACAGTA 


TTTAGATAAC 


AATAACTTAT 


TTAAAGAAGT 


CGAAAGTAAA 


5040 
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CAAGGTTTTG TGTATAAAAC TOACTTATAT AAACAAAATA AAAAAATTGA TACTGTAAAA 5160 

GTAATTAAAG AAGTAGAACT TAAGAAGCCA ATCACATACG AAGCTGGTGC TACATCAQAT 5220 

AGTAAATTAG CAAAAGAGTG GATGGAATTC TTAAAATCAG ATAAAGCTAA AGAAATACTA 5280 

AAAGAATACC ACTTTGCAGC ATAAGGAGTT GTAATCCATG CCTGACTTAA CACCTTTTTG 5340 

GATATCAATA CGAGTTGCTG TAATCAGTAC GATTATTGTA ACGGTTTTAG GTATTTTTAT 5400 

ATCTAAATGG TTGTATCGTC GTAAGGGTTC GTGGGTTAAA GTATTGGAAA GTTTATTGAT 5460 

ATTACCTATT GTTTTGCCGC CAACGGTATT AGGTTTTATT CTATTAATCA TCTTCTCGCC 5520 

AAGAGGACCA ATCGGTCAAT TCTTTGCGAA TGTACTACAT TTACCTGTAG TGTTCACTTT 5580 

GACAGGTGCT GTGATAGCAT CTGTCATTGT TAGTTTTCCA CTAATGTATC AACATACTGT 5640 

GCAAGGCTTC AGAGGTATAG ACACGAAAAT GATTAATACA GCTAGAACGA TGGGAGCAAG 5700 

TGAAACGAAA ATTTTCCTCA AATTAATTTT ACCATTAGCT AAACGCTCTA TTTTAGCAGG 5760 

TATAATGATG AGTTTTGCTC GTGCATTAGG TGAGTTTGGT GCTACATTAA TGGTTGCAGG 5820 

ATATATTCCA AATAAAACGA ATACACTACC TTTAGAAATA TACTTCTTAG TGGAACAAGG 5880 

25 TAGAGAAAAT GAAGCGTGGT TATGGGTATT AGTGCTAGTC GCATTCTCTA TTGTGGTTAT 5940 

ATCTACAATT AATTTATTGA ATAAAGATAA ATATAAGGAG GTCGACTAGA TGCTTAAAAT 6000 

CAATGTGAAA TATCAATTAA AGAACACTTT AATTCGCATC AATATAGATG ATACTGAACC 6060 

30 AAAAATTTAT GCAGTTCGTG GTCCATCTGG CATTGGTAAA ACTACTGTTT TAAATATGAT 6120 

TGCCGGATTA CGTAAAGCAG ATGAAGCTAT TATCGAAGTG AATGGGCAAT TACTTACTGA 6180 

TACGGCAAAA AACGTGAATG TTAAAATTCA ACAACGACGT ATTGGATATC TGTTTCAAGA 6240 

CTACCAATTG TTTCCTAATA TGACGGTCTA TAAAAATATT ACTTTTATGG CTGAAC CATC 63 0 0 

TGAAeACATC GATCAATTAA TTCAAACTTT AAACATTGAT CATTTGATGA AACAATATCC 6360 

TATGACATTG TCAGGTGGAG AGGCACAACG TGTAGCACTT GCACGTGCAC TTAGCACrAA 6420 

ACCAGATTTA ATTTTATTAG ATGAACCTTT TTCTAGTTTG GATGATACTA CAAAAGATGA 6480 

GAGTATTACA TTAGTTAAAC GTATTTTCAA CGAATGGCAA ATAC CAATCA TATTTGTGAC 6540 

ACATTCAAAC TATGAAGCAG AACAAATGGC TCATGAAATT ATTACAATTG GGTAATCATT 6600 

TATTTG CCAT TAAAGAGTTT AGAACGTATT TAAAATTGTA GAAGTGAATG CTTCTATCAG 6660 

CATTTTAATG ATGTTTTAAA CTCTTTTTTA GGGGCAGTTT TTTTGAGAGA CATTGACGCG 6720 

SO CGTCATATAA TGAAAGTAAT GATAAAAAGA AAGGATAACT TAATGTGAGT CAAGAACGTT 678 0 

ATTCAAGGCA AATTTTATTT AAACAAATAG GTGAAATAGG TCAAAGCAAA ATAAATCAAA 6 84 0 
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GAGCAGGCAT TGCCAAACTA ATCATTGTTG 
AAAGACAAAC ATTGTTTACT GAAGAAGATG 
5 CTAAAAAGCA TTTGCTAGCG TTACGTAGTG 

TGGATTATTA TTTTTTGGAA ACACATGGAC 

ATAACTTTGA AACACGACAA CTGATTAATG 

TO 

TTTATGGTGG TGTTGTACAG AGTACATATA 

CTTGCTTTAA CTGTTTGGTA CCACAATTGC 

GGGTCATTCA ACCTGCCGTG ACGATCGCAA 

IS 

TATTAACGGA ACAACCAATT GACACAAAAA 
ATTATTCATT TGGTTTCAGT AAAATGCAAC 

20 CAAGTTATCC GTATTTAAAC AAGAATGAAC 

CTGTACAGTA TGAAAATGCA TCAATTACAC 
ATCAGTTAAA TTATCGCAGT AATTCGTATA 

25 TTGTTGCTTT TAAAGGTGGA AGGTTTTTAA 

CACATCTAAT GAATTTATTG TTTGGATAAA 
TGGGCGAACA TCAAAACGTT AAATTGAATC 

30 CAGATACTAG AGACTTTGAT ACAGATAAAG 

CAGATGACGT TGAAGTGAGT GACGCACATT 
TCACGACGCA GGTGAAGAAG TGGTTAGAAG 

35 

GAACAGGTAT TGCACAACGT GATGTGACGA 
AGAtAGAAGG CTTTGGGGAA TTGTTTAGAT 
GTGCATTATT GTCTCGTGCT GTAGCAGGTA 

40 

CAGGATCAAC AGG CGCAGTT AAATTAGCAT 
ATCTGATTCA TGAGCTTACA AAATAATTTA 

45 TTTACCGCCA GACTTGCTTT CAAGGTAGGT 

TTTCGTCATG TCGTAAATGG TTAAAGCCGT 
AACACCGGTT TTGCCAGTTG TAGAGACAGT 

SO ATTTGTTTCA TCCCAGCTGA AGTGAACATC 

CGGAATAAGT GTTGATGTAT TTTTGGCAGC 

55 



ATAGAGATTA TATTGAATTT AGTAATTTAC 6960 

CTTTGAAAAT OATGCCTAAG GTGGTTGCAG 7020 

ATGTTGATAT TGATGATTAT ATTGCCCATG 7080 

AGGACGTTGA CGTTATTATT GATGCAACCG 7140 

ATTTTGCATA TAAATATCGT ATACCTTGGA 7200 

CAGAAGCTGC ATTTAT AC CT GGTAAAACAC 7260 

CAGCATTAAA TTTAACATGT GATACAGTAG 7320 

CAAGTTTACA ATTAAGAGAT GCGATGAAAG 73 BO 

TAACTTATGG CGATATTTGG GAAGGTAGTC 7440 

GTTCAGACTG TACAACTTGT GGAGATGTAC 7500 

AACGTTATGC AACATTGTGT GGTAGAGACA 7560 

ACGACATTCT TGTTCAATTT TTAAAACAAC 7620 

TGGTTATGTT TGAATTTAAA GGACACCGCA 7680 

TACATGGCAT GACACGCACA TCAGATGCCA 774 0 

AAAAGATAAG ACAAAAGGAG TGTAATATTA 7800 

GTACAGTTAA AGCAGCCGTA CTAACGGTAT 7860 

GTGGTCAATG CGTGCGCCAA CTATTACAAG 7 920 

ATACAATTGT GAAAGATGAA AAAGTAGCCA 7 980 

AAGATATTGA TGTCATCATT ACGACTGGTG 8040 

TTGAAGCAGT AAAACCACTT TTAACTAAAG 8100 

ATTTGAGTTA TGTTGAAGAT GTTGGCACGC 8160 

CAGTTAATAA TAAATTGATA TTTTCGATTC 8220 

TAGAAAAGCT CATTAAACCA GAATTAAATC 8280 

TTGATTTGAT TGGCGTTGAA AATCTCCAGA 8340 

TTCGCCAATA ATCATACCTT TATCAACTGC 8400 

TGCTGATGCA GCGGTTAAAG CTTCCATTTC 8460 

TGTTTGAATG TTTAAAGTAT AAAGGGGTGC 8520 

TATGCCAGTC AATGGTAATG GATGGCACAT 8580 

CATAATACCA GCGATTTGAG CAGTGTTCAA 864 0 
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AATGCTTGAA TGAGCGACAG CAGTTCTTTT TGTAATTTOT TTOTCTGATA CATCGACCAT 8760 

TTTGOCGTGG CCTTGTTGAT TAATATGAGT AAACTCAGTC ATTTTACCCC TCCTAGTGCA 8820 

5 TCTAGTATAT CATGAAAAAA TAAAAGTTTT GGAGATGATT TTTAATGGTA GTAGAAAAAA 8880 

GAAACCCAAT CCCAGTTAAA GAAG CAATTC AACGTATCGT TAATCAGCAG AGTTCAATGC 8940 

CGGCAATTAC GGTAGCACTT GAAAAAAGTC TAAATCATAT CTTAGCAGAA GATATTGTAG 9000 

10 

CTACTTATGA TATACCAAGG TTTGATAAAT CACCTTATGA TGGTTTTOCA ATTCGCAGTG 9060 

TTGATTCACA AGGGGCAAGT GGTCAGAATC GCATTGAGTT TAAAGTGATT GATCATATTG 9120 

15 GTGCAGGTTC AGTTTCTGAT AAATTAGTTG GGGATCACGA AGCGGTGCGT ATTATGACTG 9180 

GAGCACAAAT ACCTAATGGC GCAGATGCTG TTGTTATGTT TGAACAAACG ATTGAACTAG 924 0 

AAGATACATT TACAATTCGT AAACCATTTT CAAAAAATGA AAATATATCT TTAAAAGGTG 9300 

20 AAGAAACAAA GACAGGCGAT GTTGTTCTAA AAAAAGGACA AGTAATTAAT CCAGGGGCTA 9360 

TCGCGGTCCT TGCAACATAT GGCTATGCAG AGGTTAAAGT TATTAAGCAA CCGAGTGTCG 9420 

CTGTTATTGC AACAGGAAGC GAATTATTAG ATGTTAATGA TGTATTAGAA GATGGGAAAA 94 80 

25 TTCGTAACTC TAATGGCCCA ATGATTCGTG CCTTAGCAGA AAAATTAGGT CTTGAAGTTG 954 0 

GTATTTACAA AACACAAAAA GATGATTTAG AT AG TGG CAT CCAAGTCGTT AAAGAAGCTA 96 00 

TGGAAAAACA TGATATCGTT ATTACAACGG GCGGAGTTTC TGTTGGAGAT TTTGACTATT 9660 

on 

T AC CTGAG AT TTATAAGGCT GTAAAGGCGG AAGTGTTATT TAATAAAGTA GCAATGCGTC 9720 

CTGGTAG CGT AACAACGGTT G CATTT G TAG ATGGaAAGTA TTTGTTTGGa TTATCTGGAA 9780 

ATCCATCAGC TTGTTTTACA GGATTTGAAC TATTTGTGAA nCCAGCTGTT AAACATATGT 9840 

35 

GTGGCG CACT AGAAGTCTTC CCGCAAATAA TTAAAGCAAC ATTAATGGAA GATTTTACCA 9900 

AGGC^AACCC ATTCACACGA TTTATACGTG CTAAAGCAAC GTTAACAAGT GCTGGAGCTA 9960 

CTGTAGTACC TTCAGGATTC AATAAATCAG GTGCGGTTGT AGCGATTGCA CATGCTAACT 10 020 

40 

GTATGGTCAT GTTACCAGGA GGGTCACGTG GTTTTAAAGC GGGGCATACA GTAGATATTA 10080 

TATTGACTGA ATCTGACGCT GCTGAAGAGG AACTTCTTTT ATGATTTTAC AAATTGTAGG 10140 

4S TTACAAAAAG TCTGGTAAGA CAACATTGAT GAGG CAT ATT GTCTCTTTCT TAAAGTCACA 10200 

TGGTTATACA GTTGCTACTA TTAAACATCA TGGGCATGGT AAGGAAGATA TTCAATTACA 10260 

GGATTCAGAC GTCGATCACA TGAAG CATTT TGAAGCGGGG GCAGATCAAA GTATTGTACA 10320 

SO AGGTTTTCAA TATCAGCAAA CTGTAACACG TGTAGATAAT CAAAATCTTA CTCAAATTAT 10380 

TGAAAAATCT GTTACAATTG ACACCAATAT CGTATTAGTT GAAGGCTTTA AAAATGCTGA 10440 
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GAATGTTTGT 
GTTATTAAAT 
TGAAACAATT 
TAAATGAATA 
GCGTCAAAAC 
CACAAATTGG 
GAATAGGGCC 
GTAAAGATGC 
TTTGGAAAAA 
ATGAAGAAGC 
AGATATATTA 
ATTTGAAGAT 
TGTAAATGAG 
AATTCCACCG 
TTCAGTGCGA 
TAGAGTAATT 
TGCGCAATTG 
TGATAAAGGT 
GTTTTTTGTC 
TCAGTTTTTA 
TGGADGTTTT 
AGCACTACAT 
TTTGGATGTA 
TGATTTGGAC 
AAATAAAAGA 
GTAACTTTAG 
TACCTAAAAA 
AATTAGGTGT 
ATGTACTTAT 



TATAGCATTA ATGTAAGGGA 
AAAATTAAAA ATGATTGTGA 
TGAAATCGTG ACAGAACCGA 
TCAAGGTGCA GTAGTTGTTT 
GGAATATTTA GAATATGAAG 
AGATGAAATA AATGAAAAAT 
ATTACAAATT TCAGATATCG 
CTATCGAGCA AATGAATATG 
AGAAATTTGG GAAGATGGTT 
AAAGAGGGAG GAATAAGAGA 
CAAAAAGCAC AGGAAGATAT 
TTATTGTTTG AACGTTATCC 
G AATTTG T AC AAAAATCGGA 
GTTAGTGGAG GTTAAGGGAG 
TTTGGTAAGC CCAAAGCTTT 
AAGACATTAG AATCAACAAA 
GCAACGCAAT TTAAATATCC 
CCATTAGCAG GAATTTATAC 
GTTTCTGTTG ATACAC CAAT 
GTTTCTCATC TTATTGAAAA 
ATTCCAACAA TTGCATTTTA 
TCTGATAATT ACAGTTTTAA 
AGGGATGTAG ATGCGCCCTC 
GCTTTAATTC AAAAATTGTA 
TAAACTAGGA CGTCCCATCC 
GTGTGATTAT TGCATGCCTA 
TGAACTTTTA ACGTTTGATG 
AAAAAAAATA CGCATTACAG 
AGCTAAATTA AATCAAATCG 



GCATGAAGAT 
TACACAATTA 
TACAAACAGA 
TTACCGGTCA 
CGTATATTCC 
GGCCTGGAAC 
CTGTATTAAT 
CAATTGAGCG 
CAAAATGGCA 
GATGAAGGTA 
TGTGCTTGAA 
GCAAATCAAT 
TTTCATTCAA 
CATGAAAGCA 
TGCGGAAGTG 
TATGTT CAAT 
AAATGTTGTT. 
AATCATGAAG 
GATTACTGGT 
TCATTTAGAT 
TAGTCCGAAT 
AAATGTATAT 
ATATTGGTAC 
AGCTGTTAGG 
GTGACTTACG 
AAGAGGTATT 
AAATGG CT AG 
GTGGAGAACC 
ATGGTATTGA 



TTTACAGCAT 
ACATAGAGGA 
ACAATATCGT 
TGTTCGCGAA 
AATGGCTGAA 
GATAACGAGT 
TGCGGTTTCT 
TATAAAAGAA 
AGGGCATCAA 
CTTTACTTCG 
CAAGCATTGA 
AATAAAAAGT 
C CT AATGAT A 
ATAATTCTTG 
AACGGTGAGA 
GAAATTATTA 
ATAGATGATG 
CAACATCCTG 
AAAGCTGTAA 
GTCGCAGCTT 
GCATTAGGCG 
CATGAATTAT 
AAAAATATAA 
AGGTCCACAA 
GTTATCTGTG 
TGGAGATGAT 
AATCGCTAAG 
ATTGATGCGA 
AGATATTGGT 



TTGAGCAATG 

TTGAAATGAA 

GAATTCACTA 

TGGACTAAAG 

AAGAAATTGG 

ATTGTTCATA 

TCACCGCATC 

ATTGTTCCGA 

AAAGGGAATT 

CAGAAATTAA 

CTGTACAACA 

TTCAAGTTGC 

CTGTTGCATT 

CAGGTGGTCA 

CCTTTTATAG 

TTAGTACAAA 

AGAATCATAA 

AAGAAGAATT 

GCACGTTGTA 

TTAAAGAAGA 

CTATAACTAA 

CAACGGATTA 

ATTATCAGCA 

ATGGTAGAAC 

ACAGATCGGT 

TTCGTATTTT 

GTATATGCAG 

CGGGATTTAG 

TTGACTACAA 



10560 

10620 

.10680 

10740 

10800 

10860 

10920 

10980 

11040 

11100 

11160 

11220 

11280 

11340 

11400 

11460 

11520 

11580 

11640 

11700 

11760 

11820 

11880 

11940 

12000 

12060 

12120 

12180 

12240 
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ATGTCAGTTT GGATGCTATT OATGATACGC 
AAGCGACTAC GATTTTAGAA CAAATTGATT 
5 TAAATGTTGT TATACAAAAA GGTATTAACG 

TTAAAGATAA ACATATAGAG ATTCGATTTA 
GATGGGATTT CAGTAAAGTT GTAACTAAAG 

10 

TTGAAATCGA TCCTGTAGAA CCAAAATATT 
AGGATAATGG TGTTCAATTT GGTTTGATTA 
GTACACGCGC AAGGCTGTCA TCAGATGGGA 

1$ 

ATGGATTTAA CGTTAAAGCG TTTATTCGTT 
AATTTAAAGC TTTATGGCAA ATAAGAGATG 
CAGTTGCCAA TCGTCAACGT AAAAAGATAA 

20 

GGACCACTAC ATATTAAATC ATTAGAGATG 
ACAATATTAT TTATTAAAGT AAAAACGGTC 

2s GTTTTTAAAG TTTTTACAAG TTGGCGGGGC 
TACAATAATG TGCAAGTTGG CGGGGCCCCA 
GACAATGCAA GTTGGGGAAC GGGGCCCCAA 

30 TAATGTGGAA GTTGGCGGGG CCCCAACATA 
GCAAGTTGGG GATCAACGAA ATAAATTTTA 
AATCACTACA TAATAAATCT TTAGTGGTTC 

35 GAGTTGTAAT ATATCTTTTT TAGGTATAAA 
AGATATAAAT CTAAACAAGA TATAGCCAGC 
AGTTTGATAT ATAATAAATT TAAGTAATTG 

40 

AGAAACATAG GAGGCATCAT ATTATGAGTA 
GGGAGTTAAG TCAGTTAAAG CACTGGTTAA 
TTGTAGTCCT TTTTAAAGTG TATGAAGCTG 

45 

CATTACATTT TGAAATGCTA TGGGATACAA 
ATAAAAAAGA GCTTATTTCT AAATTGCGTT 
60 TCTATAGTAC TTCTCAAAAG AAATTGTTAG 
GCGTTACAAA CTAAAAACTT aAAAAgcaTG 



TATTTCAATC AATCAATAAT CGTAATATTA 12360 

ACGCGACGTC TATTGGTTTG AATGTAAAAG 12420 

ATGATCAAAT CATACCAATG CTTGAATATT 12480 

TAGAATTTAT GGATGTTGGT AATGATAATG 12540 

ATGAAATGCT TACAATGATA GAGCAGCACT 12600 

TTGGGGAAGT AGCAAAATAT TATCGCCATA 12660 

CAAGTGTTTC ACAATCATTT TGTTCTACAT 12720 

AGTTTTACGG ATGTTTATTT GCAACTGTCG 12780 

CTGGCGTGAC CGACGAAGAA TTAAAAGAAC 12840 

ATCGATATTC AGATGAGAGA ACTGCTCAAA 12900 

ACATGAATTA TATTGGTGGT TAATGTGTAG 12960 

TTTTAATATT TCTGTCTTAC TCCCTAAAAT 13 020 

ATATCTATGC CAGATTTAAT AGAAATGATC 13080 

CCCAACACAG AAGCTGACAG AAAGTCAGCT 1314 0 

ACATAGAGAA TTTCAAAAAG AAATTCTACA 13200 

CACAGAAGGT GACGAAAAGT CAGCATACAA 13260 

GAGAATTTCA AAAGAAATTC TACAGACAAT 1332 0 

TGAGAATATC ATTTCTATCC CACTCTTAAG 133 8 0 

TTTAACATTG ATGTCACACT CCATGCCATT 1344 0 

TGTTGTCGAA TAAAGAACAA GTTGTCCAAA 13500 

AATTTAATAT TTGTAATAGA TAAAATGCTA 1356 0 

TATAATAATA TGAATTACAA ACATCTAAGA 13620 

ATAAAGTTCA ACGTTTTATA GAAGCAGAAA 13680 

AAACAACACA TAAGATTTCA ATTGAAGAAT 1374 0 

AAAAGATTAG CGGTAAAGAA TTGAGGGATm 13800 

GTAAAATCGA TGTGATTATC CGTAAAaTCT 1386 0 

CTGAAACGGA TGAAAGACAA GTATTCTATT 13920 

ATAAAATTAC TAAAGAAATA GAAGTGTTAA 13980 

CCAATCTCTA TTCATCATAA TTGCGTCTTG 1404 0 
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IS 



20 



25 



3S 



GTTCATGGCA TTTCTAGTTA CATGACGTCC ATGAATTAAG AAGTAAACAA GCATAGTAAT 14160 

GATTGCTAAA GCGGCCATAA AGCCGAAGAT TTCACTATAT GAAAACATAT GAGTAAATAA 14220 

CCCAAGGAAT GATGGACCGA AGCCGACACC TGCATCTAGA CCAACGTAAA AAGTAGATGT 14 280 

CGCGATACCA TATTTAATCG GGGGTGAGAC TTTTATCGCA ATAGATTGCA TTGCAGATGA 14340 

TAAATTTCCA TACCCTAAAC CTAGGCAAGC ACCAGCAAGT AATATTAACC AGCTTTQATA 14 400 

GCTTGAAATT AAGCATACAA ATGAAAGGAA AAGCATGATA AATGCTGGGT AGACAATAAT 14460 

ATTTTCATTT TTATCATCCA TCAATCTACC AGCAATAGGT CTAGTAATTA ACGATGCTAT 14520 

AGCATAGCAA ATAAAGAAAT AGCTTGCTGC AGTGACTAGG TGTCGCTCTA AAGCAAATGC 14 560 

TTGTAAATAA GTTAGGATGG ACGCATAGGT AACGCCAATT AAAAGCATAA TTACAG CAAC 14 640 

AGGAATGGCC TCTTTTGCAA TAAATTGATG AATACTAAAT CTTGGTTTAT CAATGACATT 14 700 

AGTTTCAGTT TTGTTATTTG TTACTTCGAA ATCAACTTTT ATAAATAATG AGATAATGAG 14 760 

TCCGAGTATG CCTAATATGA CACAAATAAT AAACAGTAAG TCAATTGCGT ATTTTGTAAT 14B20 

AAGTAACATG CCTAGAAATG GGCCAATCGC TGTACCTAAT ACTAAACTTA AGGAAAATAA 14 680 

ACTGATGCCT TCACTTTTTC TATTAACAGG GGTAACGTAT GCCGCAATAG TACCTGTTGC 14 940 

AGTTGTCACA ACTGCAGTTG CGATACCGTT TATGAGACGT ACAAAGATTA AAAAAGCTAA 15000 

AGATC CAT CA ATAAAATAAA GTAATTGCGT GATAATTAAA GCAATTAAAC CAATAAATAA 15 060 

TAATCGTTTA GGTCC r ATTT SATTTACAAA TTTACCTGTA GCAAATCGA 15109 

V 

(2> INFORMATION FOR SEQ ID NO: 45: 

-<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9072 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



40 



45 



50 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

GAGAGTCAAT GGCAAGAAGA ATATAAATAT TTGAGAGCGT TAATCTTTAA TGAAACAGAA 60 

TTAGAGGAAG CGTATAAATG GATGCATCCT TGTTACACGT TGAATAATAA AAATGTAGTA 120 

CTTATCCATG GCTTCAAAAA TTATGTTGCA CT ATT ATTT C ATAAAGGTGC CATTTTGGAG 180 

GATAAATATC ATACACTCAT TCAACAGACT GAAAAGGTGC AAGCAGCTCG TCAGTTACGA 240 

TTTGAAAATT TAACAGAGAT TCAAGCACGT ACCGAAGAAA TTAAATATTA TCTAGCCGAA 300 

GCAATTAAAG CTGAAAAAGC TGGTAAAAAA GTTGAAATGA AGAAAACAGA GGAATATGTT 360 
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15 



20 



AAATTAACGC CAGGCAGACA ACATCAATAT ATATATCATA TTGGACAAQC TAAACGCAgT 4B0 

GgAACAAGAC AAAAGCGTGT TGAAAAGTAT ATTAACCAAA TACTAGAAGG TAAAGGGATG 540 

CATGATAAGT AATTAATGAG TAAAGCATAC CGGTTATACA ACAACATACA AGATGACACG 600 

AAACAACCAA TGGCTCATGC TGTTGGTTGT TTTTTTAGGT GTGTCTGTCA TGGGCAACAC 660 

TTTGACGTTG GAATTCCGTT ACAGGCTTGG GAGTAGAAAA TGTTAGCAAA AGGCAAGGGT 720 

GTCTACAATG AATGATGAAG ATATTAAAAT ATAAGGATGA CTTTGTGAGT GOCGGATGGG 780 

CGGTTGTCCG TCTGTAACAA TGGATGCGTG TGCATTATTA CAAAAATTCG ACTTTTGTAA 840 

TAATATTTCA CATTTTCGAC ACTTTTTTGC TATAAAACAA CCAATTGAGC GATAATAAAT 900 

TCGCTTTTAA AAAATATGAG TTATCTATTT AGTTGCCAAA GATAAAATAA TAATGTTTAA 960 

TAACATCATA TAGAGTATGT TAGTTTTAAA TGTCGAATAT ACGAATGTGc AAACAAAGTA 1020 

ATCGGTAGAA ATTCAACATA CATAGCGCCG TTTACTGTTA AGTATTCACA TTACAGATGA 1080 

AAAATATAAA ATTCTACATA ATCAAGACCA TGATGTGTAC TTGTTTAACT TATGACTCTA 1140 

TTTGTTTAAC AATTGCGATA ATGGTCTTTT TATTTTATGC GTATCATTCG TCATATTTTT 1200 

2S TATGAGGAAG GAGAAATGAT TATGTTAAGT ATTAAGCATT TAACGAAAAT TTATTCTGGT 1260 

AATAAAAAGG CAG TAG ATG A CATCTCTTTA GATATTCAAT CTGGGGAATT TATCGCATTT 1320 

ATTGGAACCA GTGGAAGTGG CAAAACG^CT GCTTTAAGAA TGATAAACCG TATGATTGAA 1380 

30 GCGACAGAAG GACAAATTGA AATTGATGGT AAAGATGTTC GGAGTATGAA TCCTGTCGAA 1440 

TTGCGTAGAA ATATTGGCTA TGTTATTCAA CAAATTGGCT TAATGCCTCA TATGACGATT 1500 

AAAGAGAATA TTGTGTTGGT ACCCAAATTG TTGAAATGGA CTAAAGAGGA AAAGGATAAA 156 0 

35 CGTGCAAAGG AATTAATTAA ACTTGTGGAT TTACCGGAGT CATTTTTAGA GCGTTATCCA 1620 

G CAGAACT AT CAGGTGGGCA ACAACAACGT AT CGGTGTTG TAAGAGCACT TGCGGCCGAA 1680 

CAAGATATTA TTTTAATGGA TGAACCTTTT GGTGCATTGG ATC CTATTAC GAGAGATACG 1740 

TTACAAGATT TAGTTAAAAC GTTACAACGA AAATTAGGCA AGACGTTTAT CTTTGTAACA 1800 

CATGATATGG ATGAAGCGAT TAAATTAGCA GACAAAATTT GTATTATGTC AGAAGGTAAG 1860 

GTGGTGCAAT TTGATACGCC AGACAATATT TTAAGACATC CCGCAAATGA TTTTGTACGT 1920 

GATTTTATAG GACAAAATAG ACTGATTCAA GACCGTCCCA ATGACAAGAC TGTAGAAGGT 1980 

GTAATGATTA AACCAATCAC GATACAAGCA GAAGCAACAC TGAATGACGC CGTTCATATT 2040 

ATGAGACAAA AACGTGTTGA TACTATTTTT GTAGTAGATA GTAATAACCA TTTACTAGGT 2100 

TTCTTAGACA TTGAAGATAT AAATCAGGGT ATACGTGGAC ACAAAAGTTT ACGAGACACC 2160 
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ATTTTAAAAA GAAACGTTAG GAATGTACCT GTCGTAGATG ATCAACAGCG TTTAGTAGGA 
CTGATTACGC GTGCCAATGT TGTTGATATT GTATATGACA CGATTTGGGG CGATAGTGAG 
GATACAGTGC AAACAGAACA TGTGGGGGAA GACAcTGOGT CCTCAAAAGT GCATGAGCAA 
CACACTACTA ATGTCAAAGT ACGTGACATA GGAGATGATA AATCATGATT GAGTTCCTAC 
ATGAACATGG TGGACAGTTG ATGTCGAAAA CACTGGAACA TTTCTATATT TCTATAGTGG 
CATTATTACT TGCCATCATT GTTGCAGTAC CTATAGGCAT TTTATTATCA AAAACAAAGC 
GAACTGCCAA TATTGTATTA ACTGTGGCAG GTGTCTTACA AACTATTCCA ACACTAGCTG 
1S TACTTGCTAT TATGATACCG ATTTTTGGTG TTGGTAAAAC GCCTGCAATT GTAGCGCTAT 

TTATTTATGT ATTATTACCT ATTTTAAATA ACACGGTACT CGGTGTTCAA AATATTGATA 
GCAACATTAA AGAAGCTGGA AAAAGTATGG GAATGACACA ATTTCAATTG ATGAAGGATG 
20 TTGAATTGCC GTTAGCATTG CCGCTTATCA TTGGTGGCAT TCGTTTGTCA TCTGTGTATG 

TAATTAGTTG GGCTACACTT GCAAGTTATG TAGGTGCGGG TGGATTAGGT GATTTCATTT 
TCAATGGTTT AAATTTATAT GATCCACTGA TGATTGTAAC TGCAACGGTA CTCGTTACTG 
25 CACTAGCATT AGGTGTTGAT GCCTTATTAG CTTTAGTTGA AAAATGGGTA GTTCCCAAAG 

GCTTAAAAGT ATCTGGATAA TTAGGAGGCT AAGATAATGA AGAAAATTAA ATATATACTT 
GTCGTGTTTG TCTTATCGCT TACCGTATTA TCTGGATGTA GTTTGCCCGG ACTAGGTAGT 
AAGAGCACGA AAAATGATGT CAAAATTACA GCATTATCAA GAAGCGAATC GCAAATTATT 
TCACATATGT TACGGTTGTT AATAGAG CAT GATACACACG GTAAGATAAA GCCAACATTA 
GTAAATAATT TAGGGTCAAG TACGATTCAA CATAATGCCT TAATTAATGG GGATGCTAAT 
ATATCAGGTG TTAGATATAA TGGCACAGAT TTAACGGGAG CTTTGAAGGA AGCACCAATT 
AAAAATCCTA AGAAAGCAAT GATAGCAACA CAACAAGGAT TTAAAAAGAA ATTTGATCAA 
ACGTTTTTTG ATTCGTATGG TTTTGCGAAT ACGTATGCAT TCATOGTAAC GAAGGAAACC 
GCTAAAAAAT ATCATTTAGA GACAGTTTCA GATTTAGCAA AGCATAGTAA AGATTTACGT 
TTAGGTATGG. ATAGTTCATG GATGAATCGT AAAGGCGATG GCTATGAAGG ATTTAAAAAA 
GAGTATGGTT TTGACTTTGG TACAGTGAGA CCAATGCAAA TAGGTCTAGT CTACGACGCA 
TTAAACTCAG AGAAGTTAGA CGTTGCATTA GGTTATTCTA CAGATGGTCG AATTGCGGCG 
TATGATTTGA AAGTACTTAA AGATGATAAA CAATTTTTCC CACCTTATGC TGCGAGTGCT 
GTTGCAACAA ATGAATTATT ACGGCAACAC CCAGAACTTA AAACGACGAT TAATAAGTTG 
ACAGGAAAGA TTTCGACTTC AGAGATGCAA CGCTTGAATT ATGAAGCGGA TGGTAAAGGT 



30 



35 



40 



2280 
2340 
2400 
2460 
2520 
2580 
2640 
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AAAGGTGGTC ATAAGTAATG GAAGGTAATT TATTACAGCA ATTATTCAAT TATTATGTTA 4080 

CGAACTTTGG TTATCTATGG GATTTATTTT TCAAACACTT ATTAATGTCT GTCTATGGTG 4140 

5 TGCTGTTTGC AgCTTTAATT GGTATTCCAT TGGGAATCTT GCTTGCaAGA TACACAAAAC 4200 

TTTCTGGATT TGTAATTACA ATTGCAAATA TAATTCAAAC AGTTCCAGTC ATTGCAATGT 4260 

TAGCTATTTT AATGTTAGTC ATGGGCTTAG GTTCAGAAAC AGTAGTTTTA ACAGTGTTTT 4320 

10 

TATATGCGTT ACTTCCAATT ATAAAAAACA CTTATACTGG TATAGCTAGT GTTGATGCGA 43 BO 

ATATTAAGGA TGCTGGCAAA GGTATGGGAA TGACACGCAA TCAAGTGCTA CGAATGATTG 4440 

^ AATTACCGTT ATCTGTTTCG GTTATTATCG GTGGCATTCG TATTGCCTTG GTTGTTGCGA 4500 

TAGGTGTTGT TGCCGTTGGA TCATTTATAG GAGCACCTAC GCTTGGTGAC ATTGTGATTC 4560 

GTGGTACAAA TGCGACGGAT GGCACAACGT TTATTTTAGC AGGTGCGATT CCGATTGCTA 4 620 

20 TCATTGCAAT CGTCATTGAT GTACTATTAA GATTTTTAGA AAAACGATTA GACCCAACAA 4680 

CACOACATCG TAAAAATCAA TCTAATCATC GGCCGCAAAG TATTAATATG TAATAGTAGA 474 0 

AGATGTTTAT AATTTAGCGA TTTCGTTTCA TGATTTATAA AAAATGAGGC TACTCAAGGA 4 800 

25 GCTCAAATAA TCTTTGAGTA GCCTTTTTAT AGGTTGTGTT TGTATGCGTT TACACTAAAA 4860 

TAG CAATT AT TATCATGAAA GTTTTTGGAT AAAAAGCGTT AATTATTGTA AAAATACTAA 4 920 

AAAATGAGAT GTTTTATTTA TAATTTTCTG CAAATTTATG ATATTGTTTC TTAATATATC 4 980 

30 ATATTAAAAA TTTGTTTTTC TTAAACATAG GAGGCTTATC TAA1TCATGG ACACATCAAA 504 0 

ACAATTTAGA GGTGACAACC GATTGCTTTT GGGTATCGTT TTAGGGGTTA TTACCTTTTG 5100 

GCTATTCGCG CAGTCACTTG TTAATCTTGT TGTCCCATTA CAATCAACAT ATAGTAGTGA 5160 

35 

CGTTGGAACG ATAAATATCG CTGTTAGCTT ATCTGCCTTA TTTGCTGGTT TGTTTATCGT 522 0 

AGGTGCTGGT GATGTTGCTG ATAAATTTGG TCGCGTCAAA ATTACTTATG TAGGATTGAT 5280 

ATTAAATGTT GTAGGTTCAT TACTCATCAT CATTACACCT TTGCCAGCAT TTTTAATTAT 534 0 

40 

AGGTAGAATA ATTCAAGGTT TGTCTG CAGC ATG T ATT ATG CCATCAACAC TTGCTATTAT 5400 

TAACGAATAT TATATTGGTA CAAGAAGACA ACGTGCCTTA AGCTATTGGT CTATTGGTTC 5460 

TTGGGGTGGT AGTGGTATTT GTACGTTGTT TGGTGGCTTA ATGGCTACAT ATATAGGTTG 5520 

45 

GCGTTCAATA TTTGTTGTTT CAATTCTATT AACATTATTA GCAATGTACT TAATCAAACA 5 580 

TGCACCTGAG ACTAAAGCAG AACCAATCAA AGGTATGAAA GCAGAAGCTA AAAAGTTTGA 5640 

SO CGTTATTGGT TTAGTCATTT TAGTAGTGAC GATGTTAAGT TTAAATGTAA TCATCACACA 5700 

GACGTCTCAT TTTGGTTTAG TTTCACCGTT AATTCTAGGT TTAATTGTTG TGTTTATCTG 5760 
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AATTTTTAAA AATAGAGGAT ACAGTGGTGC 
AGCAGGTGGT GCACTTATCG TTATTAACAC 
6 TTCGCAAACG GGTTATATTT CATTAACGTA 

AGGTGAAAAG ATTTTAT CTC AACATGGTCC 

TACAGTGATT GGGTTAATCT TATTGTCGTT 

10 

ATCTAGTATA GTTGGATATT TATTGTTTGG 

AACTGATACA GCAGTTGCTA GTGCGCCAGA 

TAAAATGGCG TCATCATTAG GAAATGCATT 

15 

TGTGTTAGCA GCTAATTTAA ATTTGAACTT 

CTTGCTAGCA ATTGTTGCAT 1TTTAGTCAT 

TTTGTAAAAC TGAAATGAAA GCAAGTTATT 

20 

AGTAAGTTTA TCATACACAC TTAATGTTGC 
TTATAGACGA TAAAAGCTGT GTGCATATTA 

25 GCGAAAGTAG TATTTTTAAA ATGAACAACA 

TGCAATTGCT GGATCGGGTG CATTAGGTAG 
ATATGATGTC ACACTTATTG ACGGATATAC 

30 ATTAAATATA ACGATTAATG GAGAGGCATT 

TGATCAACCG GACGAAAGCA TTTACGATGT 
AAAAGAAGTG ATGGAAGATA TGAAGCCACA 

35 GATGAATGGT CTGAAGCATG AAGAAGTCAT 

CAGAGGTGTT ACGACTTGGA CGGCAGGTCT 
TAGTGGACCA GTTGAAATAG GTGAACTAGT 

40 TGCTGATTTA CTTAACGAAG CGGAATTGAA 

GATTTGGAAA AAGATTTGTG TTAATGGTAC 
TAATATGGCA TCGCTGAATG AAAGTAGTTA 

45 

AGAAATAGTG CATGTAGCGA CGATTGATAA 
ATATTTAGTT GATTTAAATG AAaAAGTTGG 
AATTGTTAAT AATAGAAAAA CTGAAATTGA 

SO 

TAAACAACGT CaTATTGAAG CGCCAGTCAA 



AACTATTTCA AACTTCTTAT TAAATGGTGT 58 80 

GTATTATCAA CAACAATTAG GATTTAATTC 5940 

TTTAATAACA GTGTTGTCAA TGATTCGTGT 6000 

GAAGCGCCCA CTATTACTAG GAAGTGGCTT 6060 

AACATTTTTA CCAGAAGTGT GGTATATCAT 6120 

TACTGGTTTA GGATTATATG CTACACCATC 6180 

TGATAAGTCG GGTGTTGCTT CAGGTGTGTA 6240 

TGGAGTAGCA GTATCTGGTA CGGTTTATAC 63 00 

AGGTGGTTTC ACAGGTATGA TGTTTAATGC 63 60 

TTTACTATTA GTTCCTAAAA ATCAAACGAA 6420 

ATGTAGGGAT TTTAAAGGAA ATTTTGTGAA 64 80 

GTATTGACGT TTAATGTTAG GTGTGTTCTT 654 0 

AGCGAATGAT TTTCAAATTG ACGCTAATAT 6600 

ACGATGAAGA GGGGTTTATA GGATGAAAAT 6660 

TGGCTTTGGT GCCAAACTAT TTCAAGCAGG 6720 

ATCTCATGTT GAAGCGGTTA AG CAACATGG 6780 

CGAGTTAAAC ATTCCGATGT ATCATTTTAA 6 840 

TGTCTTTCTA TTTCCAAAGT CTATGCAATT 6900 

TATTGATAAT GAAACGATCG TCGTATGTAC 6 960 

TGCGCAGTAT GTTGCTCAAT CACAAATTGT 7020 

TGAAAGCCCT GGACACAGTC ATTTACTTGG 7080 

GGATGAAGGT AAAGAAAATG TTATAAAAGT 714 0 

TGGTGTCATT AGTAAAGATT TATACCAATC 7200 

GGCAAATGCA TTAAGCACAG TGTTGGAGTG 7260 

TGCGAAGTGT TTGATTTATA AATTAACGCA 7320 

TGTTCATTTA AATGTTGATG AAGTATTTGA 7380 

TGCGCATTAT CCATCCATGT ATCAAGATTT 7440 

TTATATTAAT GGCGCAGTTG CAACATTAGG 7500 

TCGCTTTATT ACTGATTTAA TTCATACTAA 7560 
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CAATCACGTG ATATTACGGT CATTATTAAG ATTGAAATGT AATAAATAAA GAACAOCAGT 7680 

AAGGTACTTT CAAATTGAAA TGATCTTGGT GCTGTTTTTC TTGATTGATC TTCGTCATAA 7740 

5 TTCAGATTTG TCATAGGcTA CGACATACTA TTAGTATTTA CTAGACAGTT TTTACGACGA 7800 

CACTTTGAAA AATTTTGAGG CAAATCATTT GGAAGTCTCA CGTGAATTTT GTAAACTCAT 7860 

CAAGCAAGTA ATTATATTAA AAAGACAAAT AGAGAAAAGG TGTTTATAAT GAGTAAAATT 7920 

10 

TTTGTAACTG GTGCAACGGG CCTTATTGGC ATTAAATTAG TTCAAAGACT AAAAGAAGAG 7980 

GGGCATGAGG TTGCTGGTTT T A CTACATCT GAGAATGGTC AACAAAAGCT AGCTGCTGTT 8040 

AATGTAAAAG CATATATTGG TGATATATTA AAAGCTGATA CTATTGATCA AGCGTTAGCA 8X00 

15 

GATTTTAAAC CAGAAATCAT TATCAATCAA ATTACGGATT TAAAAAATGT TGATATGGCA 8160 

GCAAATACGA AAGTACGTAT TGAAGGTTCT AAAAACCTAA TTGATGCGGC GAAAAAGCAT 8220 

2Q GACGTTAAGA AAGTAATTGC CCAAAGTATT GCCTTTATGT ATGAACCTGG CGAAGGATTA 8280 

GCAAATGAGG AAACTTCACT TGATTTTAAC TCAACTGGCG ATAGAAAAGT AACGGTTGAT 8340 

GGTGTGGTTG GTTTAGAAGA AGAAACGGCT CGTATGGATG AATACGTTGT TTTACGTTTT 8400 

25 GGCTGGTTAT ATGGCCCAGG TACTTGGTAC GGAAAAGATG GCATGATTTA TAATCAATTT 8460 

ATGGATGGTC AAGTGACACT TTCAGATGGC GTAACATCAT TTGTGCATCT TGATGATGCA 8520 

GTTGAAACAT CTATTCAAGC TATTCATTTT GAAAATGGTA TCTATAATGT AGCAGATGAT 8580 

50 GCACCTGTTA AAGGTTCTGA ATTTGCAGAA TGGTATAAAG AACAACTTGG TGTTGAACCA 8640 

AATATTGATA TTCAACCTGC GCAACCATTT GAACGTGGCG TAAGCAATGA GAAGTTTAAA 8 700 

GCG CAAGGTG GTACTCTGAT TTATCAAACT TGGAAAGATG GCATGAATCC AATTAAATAA 8760 

35 

TAATTTATCC GTTTAATATA CAAAGAATAA AGACTTGGTC GAATCGTGGA TGATATATTA 8 82 0 

TCAAACGCAC GGCTCGAACA AGTCTTTTTT ATTATGT CTT CGTTATCTTT GTATGAAGGA 8880 

ATAACAGAAT TACAATTAAT GTACTGAATA ATGCAATTAA TGTTGTGATT AGTGCTAATT 8940 

40 

TAATTTCTAT TGGTAGCCAA GTCAGTACAA AAGACCAATT ATTGCTACCG AGAATGAGAT 9000 

ATGGTAATGC ATATAATATG AGCGCTAAAG CGATACATAT ACATAATGAT AACCAACTCA 9060 

ATACAGCAAT CC 9072 

45 

(2) INFORMATION FOR SEQ ID NO: 46: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16826 base pairs 
so (B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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10 



15 



35 



40 



45 



60 
120 
180 
240 
300 
360 
420 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
GTGGAACAGC TGTAACTATA TCATTTCTTT CAACATTTAT TGGGAAAATG TTAGCTACAT 
TTCTATATCC GATTAATAAT GTAGTACTTT CATATATnTC TGTAAATGAA AGTGACAATA 
TAAAGAAGCA ATATTTGaAA ACTAATCTAA TTGCTATAGC TGCCCTATGT TTAGTCATGA 
TTATATGTTA TCCAATTACA ATAATTATTG TCTCTTTACT GTATAACATT GATTCAAGTT 
TATATTCGAA GTTTATTATT TTAGGTAATA TAGGTGTTTT ATTCAATGCA GTGAGTATTA 
TGATCCAAAC TTTAAATACA AAACACGCAT CAATAACATT ACAAGCGAAT TATATGACGC 
TTCACACGAT TACATTTATA TTCATAACTA TTTTAATGAC AATTGCGTTT GGTCTAAATG 
GATTCTTTTG GACAACGCTG TTCAGCAACA TTATTAAGTA TGTGATTTTA AATATTATAG 460 
GTTTAAAGTC TAAATTCATT AATAAAAAGG ACGTCGATTA GATGAGTGAA AAAAAGATTT 540 
2Q TGATTTTATG TCAGTATTTT TATCCGGAAT ATGTATCTTC TGCGACGTTA CCAACTCAAT 600 

TGGCGGAAGA TTTAATTGCG AATCACATTA ATGTCGATGT CATGTGTGGA TGGCCATATG 660 
AATATAGTAA TCATAAACAG GTTTCTAAAA CCGAGATGCA TCGTGGTATT CGCATTCGAC 
25 GTCTCAAGTA TTCGAGGTTT AATAACAAAA GTAAGGTTGG AAGGATCATC AATTTCTTTA 

GTTTATTTTC AAAATTCGTG ATTAATATAC CTAAAATGTT GAAATATGAT CAGATTCTTG 
TTTACTCTAA TCCACCAATC TTGCCATTAA TACCAGACGT TTTACACAGA CTGCTTAAGA 900 
30 AAAAATATTC TTTTGTGGTG TATGATATAG CACCTGATAA TGOGATTAAG ACAGGTGCAA 960 

CTCGTCCAGG TAGCATGATT GATAAGCTGA TGCGTTACAT TAATAGACAT GTCTACAAGA 
ATGCTGAAAA TGTCATTGTC CTTGGTACGG AAATGAAAAA CTACTTACTA AATCATCAAA 
TTTCTAAAAA TGCTGACAAT ATCCATGTGA TTCCTAACTG GTATGACATG CGTCAATTAC 
AAGfl££AATCG TATCTATAAT GACACATTTA AAGCTTACCG TGAGCAATAC GACAAAATTT 
TATTGTATAG CGGTAATATG GGGCAGTTAC AGGATATGGA GACACTTATC TCATTTTTAA 
AATTAAATAA GGATCAGTCT CAAACGTTAA CAATACTTTG TGGTCATGGT AAGAAATTTG 
CAGATGTCAA AACGGCAATA GaAGAC CATC GTATTGAAAA TGTTAAAATG TTTGAGTTTT 
TAACAGGTAC AGACTATGCT GACGTATTAA AAATTGCGGA TGTATGTATT GCATCGCTGA 
TTAAAGAAGG CGTCGGTTTA GGCGTGCCGA GCAAGAATTA TGGCTATCTT GCAGCTAAGA 
AAGCGTTGGT ACTCATCATG GATAAG CAAT CTGATATCGT TCAACATGTT GAACAATATG 1560 
SO ATGCGGGTAT CCAAATTGAT AATGGCGATG CACATGCCAT TTATAACTTC ATCAACACTC 1620 

ACTCGAGTAA GGAATTGCAC GAGATGGGTG AGCGCGCACA TCAACTGTTT AAAGATAAAT 1680 



720 
780 
840 



1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
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AAGCGATTAT 


TCGATGTAGT 


GAGTTCAATA 


TATGGTTTAG 


TAGTTTTAAG 


TCCGATTCTG 


1B00 




TTAATTACAG 


CATTACTAAT 


TAAAATGGAa 


TCACCTGGAC 


CAGCCATTTT 


CAAACAAAAA 


1860 


5 


AGACCGACGA 


TTAATAATGA 


ATTGTTTAAT 


ATTTATAAGT 


TTAGATCAAT 


GAAAATAGAC 


1920 




ACACCTAATG 


TTGCAACTGA 


TTTAATGGAT 


TCAACATCGT 


ATATAACAAA 


GACAGGGAAG 


1380 


10 


GTCATTCGTA 


AGACCTCTAT 


TGATGAATTG 


CCACAATTAT 


TGAATGTTTT 


AAAAGGAGAA 


2040 


ATGTCAATTG 


TAGGTCCTAG 


ACCAGCGCTT 


TATAATCAAT 


ACGAATTAAT 


CGAAAAACGT 


2100 




ACAAAAGCGA 


ACGTGCATAC 


GATTAGACCA 


GGTGTGACAG 


GACTAGCTCA 


AGTGATGGGG 


2160 


15 


AGAGATGATA 


TCACTGATGA 


TCAAAAAGTA 


GCGTATGATC 


ATTATTACTT 


AACACATCAA 


2220 


TCTATGATGC 


TTGATATGTA 


TATCATATAT 


AAAACAATTA 


AAAATATCGT 


TACTTCAGAA 


2280 




GGTGTGCATC 


ACTAATGAGA 


AAAAATATTT 


TAATTACAGG 


CGTACATGGA 


TATATCGGTA 


2340 


20 


ATGCTTTAAA 


AGATAAGCTT 


ATTOAACAAG 


GACAT CAAGT 


AGATCAAATT 


AATGTTAGGA 


2400 




ATCAATTATG 


GAAGTCGACC 


TCGTTCAAAG 


ATTATGATGT 


TTTAATTCAT 


ACAGCAGCTT 


2460 




TGGTTCACAA 


CAATTCACCT 


CAAGCAAGGC 


TATCTGATTA 


TATGCAAGTG 


AATATGTTGC 


2520 


25 


TGACGAAACA ATTGGCACAA AAGGCTAAAG 


CTGAAGACGT 


TAAACAATTT 


ATTTTTATGA 


2560 




GTACTATGGC 


AGTTTATGGA 


AAAGAAGGTC 


ATGTTGGTAA 


ATCAGATCAA 


GTTGATACAC 


2640 




AAACACCAAT 


GAACCCTACG 


ACCAACTATG 


GTATTTCCAA 


AAAGTTCGCT 


GAACAAGCAT 


2700 


30 


TACAAGAATT 


GATTAGTGAT 


TCGTTTAAAG 


TAGCAATTGT 


GAGACCACCA 


ATGATTTATG 


2760 




GTGCACATTG 


CCCAGGAAAT 


TTCCAACGGT 


TAATGCAATT 


GTCAAAGCGA 


TTGCCAATCA 


2820 




TTCCGAATAT 


TAACAATCAG 


CGCAGTGCAT 


TATATATTAA 


ACATCTGACA 


GCATTTATTG 


2880 


35 


ATCAATTAAT 


ATCATTAGAA 


GTGACAGGTG 


TGTACCATCC 


TCAAGATAGT 


TTTTACTTTG 


2940 




ATACATCGTC 


AGTAATGTAT 


GAAATACGTC 


GCCAATCACA 


TCGTAAAACG 


GTATTGATCA 


3000 


40 


ACATGCCTTC 


AATGCTAAAT 


AAG TATTTTA 


ATAAGTTGTC 


GGTCTTTAGA 


AAATTATTCG 


3060 


GCAATTTAAT 


ATACAGCAAT 


ACGTTATATG 


AAAATAATAA 


TGCACTTGAA ATTATTCCTG 


3120 




GAAAAATGTC 


ACTTGTTATT 


GCGGACATCA 


TGGATGAAAC 


GACAACCAAA 


GATAAGGCAT 


3180 


45 


AAGTCATCTA 


TTAAATAAAA 


TCAACATACA 


AATCGTTTTA 


TTTGGAGGTT 


ATAGTATGAA 


3240 




GTTAACAGTA 


GTTGGCTTAG 


GTTATATTGG 


TTTACCAACA 


TCAATTATGT 


TTGCAAAACA 


3300 




TGGcGTCGAT 


GTGCTTGGTG 


TTGATATTAA 


TCAGCAAACG 


ATTGATAAGT 


TACAAAGTGG 


3360 


50 


TCAAATTAGT 


ATTGAAGAAC 


CTGGATTACA 


AGAGGTTTAT 


GAAGAGGTAC 


TGTCATCGGG 


3420 




AAAATTGAAG 


GTATCTACAA 


CGCCAGATGC 


ATCTGATGTT 


TTTATCATTG 


CCGTTCCGAC 


3480 
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TAGTATTTTA TCATTTTTAG AAAAAGGAAA 
TAAAAOGATG GATGATTTTG TAAAACCAOT 
5 AGATATTTAT TTAGTGCATT GTCCAGAACG 

AGTTCATAAC AATCGTATCA TTGGCGGTGT 
TGTCTATCGC ACATTCGTTC AGGGAGAAAT 

10 

GAGTAAGCTA ATGGAAAACA CATATAGAGA 
AAAAATTTGC AATAACTTAA ATATTAATGT 

TCCGCGTGTT AACATCCATC AGCCTGGTCC 

15 

TCCGTACTTT ATTATTGCTA AAGACCCTGA 
AATTAATAAT TCAATGCCGG CCTATGTTGT 

2Q GAGCGGGAAT AAAGTCACAG TATTTGGTTT 

AGAATCACCA GCATTTGATA TTTATGAGCT 
TGCTTATGAT CCACATGTTG AATTAGATTT 

25 AGACGCATCG CTAGTATTGA TTTTAAGTGA 

TCATTTTGAT AAAATGAAGC ATAAAGTGAT 
ATTTGAAGAT GTATCGTATT ATAATTATGG 

30 TGTGTCAAAC TAGGGCATAC ATGATTAAGG 

GAGAGGATAA TGTTATGAAA AAAATTATGG 
AAATGGCACC ATTAGTAAAA GAAATTGATC 

35 TTACAGCACA ACATAGAGAT ATGTTAGATA 

ATCATGATTT AAATATTATG CAAGATCAAC 
TTGCTAAACT TGATAGCATC ATTAATGAGG 

40 

ATACTACAAC GACTTTTGTA GGAAGTTTGG 
ATGTAGAAGC TGGACTTCGA ACACATCAGA 
GAGTCATGGT AAGTAATATT GCTGAATTGA 

45 

ATTTACTTTT TGAAAACAAA GACAAAGAGC 
ACGCATTGTC AACAACAGTT CAAAATGATT 
50 GCAAGAAAGT TGTTTTACTA ACAGCGCATC 

AGATTTTTAA AGCAGTAAGA GATTTGGCAG 



TACCATTATT GTAGAGTCGA CAATTGCGCC 3600 

CATTGAAAAT TTAGGGTTTA CAATAGGTGA 3 €60 

TGTACTGCCA GGAAAAATTT TAGAAGAATT 3720 

GACTGAAGCT TGTATTGAAG CGGGTAAAOG 3780 

GATTGAAACA GATGCACGTA CTGCTGAAAT 3840 

CGTGAACATT GCTTTAGCTA ATGAATTAAC 3900 

ATTAGATGTG ATTGAAATGG CAAACAAACA 3960 

AGGTGTAGGC GGTCATTGTT TAGCTGTTGA 4020 

AAATGCAAAG TTAATTCAAA CTGGACGTGA 4080 

TGATACAACG AAGCAAATCA TCAAAGTGTT 4140 

AACTTATAAA GGTGATGTTG ATGATATAAG 4200 

ATTAAATCAA GAACCAGACA TAGAAGTATG 4260 

TGTGGAACAT GATATGTCAC ATGCTGTCAA 4320 

CCACTCAGAA TTTAAAAATT TATCGGACAG 4380 

TTTTGATACA AAAAATGTTG TGAAATCATC 444 0 

CAATATATTT AATTTTATCG ACAAATAAAA 4 500 

AAAGATAAGC TGTCATGTGT TTGAACTTCA 456 0 

TTATTTTCGG TACGAOACCC GAAGCAATAA 4620 

ATAATGGGAA CTTTGAAGCG AACATTGTGA 46B0 

GTGTGTTAAG T AT ATTTGAT ATTCAAGCTG 4740 

AAACATTAGC AGGCCTTACG GCGAATGCAC 4800 

AACAACCGGA TATGATTTTA GTACATGGTG 4860 

CAGCATTTTA TCATCAAATT CCGGTCGGAC 4920 

AATACTCACC ATTTCCTGAA GAGTTAAATC 4980 

ATTTTGCGCC AACAGTAATT GCAGCTAAAA 5040 

GTATCTTTAT TACTGGAAAT ACAGTTATTG 5100 

TTGTTTCAAC GATTATTAAT AAACATAAAG 5160 

GTCGTGAAAA TATTGGGGAA CCGATGCATC 5220 

ATGAATATAA AGATGTTGTC TTCATTTATC 5280 
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GGATTGAATT AATTGAGCCA TTAGATGCGA 
ACCTCGTGCT GACAGATTCT GGTGGTATTC 
5 TGTTGGTATT AAGGAATCAT ACAGAGCGTC 

TAATTGGCAC AGATTATGAC AATATTGTTC 
AAGCGTATCA ACGTATGAGT CAAGCGAATA 

10 

GTATTTGTGA AGCAATAGAA TATTATTTTG 
TACCTTTACG TCACAAATAA TAAAAAACCC 
GGTGACTAGG GGTTTTTAAT ATATTTATTT 

IS 

TACTTTATTT GATAATATTG GACTTTGCTG 
TTTATG AG CT TCTTTAAATA CATCGGAATT 

2Q CCAAATAGTT AAGATTTTAA CTTCGTCTGT 

CATTTGTTGG AAGCCTTCAA TAGTTT CAAT 
TTCTTCCGCA CTGCCTTTTT GTAATTGTAA 

25 CTCCTCTATT TTATGATTTG ATTTGGGTAA 

TTGTATGATA ACCATTATGA TTAATCCTAC 
TGAAAAACCT ATTACAATGT ATAAGCTAAT 

30 GCCTCGATGT AAATAAAAGT TTTCTACATA 

ATTGTAAAAG CGATCTGAAC TTCGAGCAAA 
CGTTGGCAGT AAAGGTAATA CGGCACCTGC 

35 GACGATTAAA ATAAGTCGCA TTGAAAAAAC 

TGTTTTAGTA AT AT AA CT CA TGCTAAATAT 
AAATGAAACG ATGTTGAATT ATCCTTGTCA 

40 

ACAACCAAAC TATTTAATGA GAATTATTCT 
TTTTGAAAAG TG CAAT ATGT TTTCGAAAAT 
AAAGTTTTAA AAATGAGACT TCTGTGAGCT 

45 

TATAATGTGA ATCATATCGT TTAAAAGCAT 
ATAGCCAATA AACAAAGGAG AGATAATATG 
60 AATTATGGTT TATTTATCAA TGGGGAATTT 

GTGACTAATC CAGCAACTGG AGAAACACTA 

55 



TTGAGTTCCA TAATTTTACA AATCAATCGT 5400 

AAGAGGAGGC TCCTACATTT GGAAAACCTG 5460 

CCGAAGGCGT TGAGGCGGGA ACATCGAGAG 5520 

GAAATGTGAA ACAATTGATT GAOGATGATG 5580 

ATCCATATGG TGATGGACAA GCATCACGAC 5640 

GATTGCGCAC AGACAAGCCG GATGAATTCG 5700 

CTAATCATGA AGTTGGTTTA GACAACCAGC 5760 

TTGATAGTGG TAGCCAATAT CATATTTGAA 5820 

TCCATCGTCA TCACTTTTTA AACGTACATT 5880 

CAACCAATTA TTAAAGCTAT CTTCAGATTC 5940 

ATCCTCGGTA TTTAATGTTT TAGTGACAAA 6000 

ACCTTGTCTA TTGTAAAAAC GTTCAATCGT 6060 

TCTATTTTCT GCCATAAACA TGGGCAATCA 6120 

TGTTTTTACA AATGTAAAGA GTACAGCGGT 6180 

ACGGACTGCA AGAACATCCA C CAT ATAAAT 6240 

TAAAATTTTA ATTTTCTGTT GTAGCGTGTA 6300 

TTCTTTATAA ATTTTTTGAT TAATAAGCCA 6360 

GCAAAAAACT GCTACGAGTA AAAAAGGGGT 6420 

AATACCAAGC GCTGTAAATA TTAAGCCAAT 6480 

TCCATTCTAG TACTAATGCG CATGTAATAT 6540 

AATGTGTATG ATAAGTGCAA TGACTCAGTA 6600 

CATTAACGCA TTTTAAGCGC GACTTTCATA 6660 

CAAGTATTAT AGTTATATTA TGTGTTTTAT 672 0 

AAGATTATTT TTATGTGCAA AAACGACGCA 6780 

GATTATTTTA TAAAATGTAA ACGCTTACTA 6840 

TATTAAATAT GATG CTAAG A GATTTATATT 6900 

GCAGTAAACG TTCGAGATTA TATTGCAGAG 6960 

GTTAAAGGTA GCAGTGACGA AACAATCGAA 7020 

TCACATATTA CAAGAGCAAA AGATAAAGAT 7060 
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TCAOAACGTG CACAAATGTT GCC3TGATATT 

ATTGCAATGA TTGAAACATT AAATAATGGT 

5 

ATTCCATTTG CTGCAAGACA TTTCCATTAT 

ACAGTGAATG ATATCGATAA AGACACAATG 

GTAGGTGCTG TTGTTGCTTG GAACTTCCCA 

10 

gCCATTGCTG CAGGTAATAC AATTGTGATT 
TTGGAAGTTG CTAAAATTTT CCAAGAGGTA 

15 GGTAAAGGTT CAGAATCAGG TAATGCAATT 

TTTACGGGCT CAACTGATGT AGGTTATCAA 
CCCGCTACAT TAGAGCTTGG TGGTAAAAGC 

20 GACCTTG CAG TTGAAGGTAT TCAGTTAGGT 

GCAGGTTCTC GATTATTAGT TCATGAAAAA 
GAGGCATTTT CAAATATTAA AGTTGGAAAT 

25 CAAACTGGTA AGGATCAATT AGATAAAATT 

GATGCACAAA TTTTAGCAGG CGGTCATCGC 
TTCTTTGAGC CGACATTAAT TGctGTGCCA 

30 ATATTTGGAC CAGTGTTAAC AGTGATTAAA 

GCTAATGATT CTGAGTATGG TTTAGCAGGC 
TTAAATATTG CTAAAGCTGT ACGTACAGGA 

35 

CCAGAAGGCG CACCATTTGG TGGTTATAAA 
GGTGCGTTAA GTAACTATCA ACAAGTTAAA 
AAAGGTTTGT ACTAGAATAA ATATCGTTTC 

40 

AAGTCTTAAC ATTTAACGGC GTTGTTTAGA 
GTATCATGAT ATTAGGATAT AATGACTAAA 
AATCATCTTA CTGCTGTTTT TAATTATGCT 

45 

GTTGTTTATT AATTATGGTG ATTTAGAAAT 
TTTTAATATG CGGAACAATC ATTAAAGTTA 
50 CAATAAATTT GAGATACTTT TTTGTCATTT 

TATTAAAATT TTCTATATGA TAGGAATAAA 

55 



GGTOATAAAT 


TAATGGCACA 


AAAAGATAAA 


7200 


AAACCGATTC 


GTGAGACAAC 


AGCAATTGAT 


7260 


TTCGCAAGTG 


TTATTGAAAC 


AGAAGAAGGT 


7320 


AGTATCGTAC 


GACATGAGCC 


GATTGGCGTC 


7380 


ATGCTATTAG 


CTGCATGGAA 


GATTGCGCCA 


7440 


CAACCTTCGT CTTCAACACC ATTAAGTTTA 


7500 


TTACCTAAAG 


GTGTTGTCAA 


TATACTAACG 


7560 


TTCAATCATG 


ATGGTGTAGA 


TAAATTATCA 


7620 


GTTGCCGAAG 


CTGCAGCAAA 


ACATCTAGTA 


7680 


GCCAATATCA 


TATTAGATGA 


TGCTAATTTA 


7740 


ATTTTATTCA 


ACCAAGGTGA 


AGTATGTAGT 


7800 


ATTTATGATC 


AATTGGTGCC 


ACGTTTACAA 


7860 


CCACAAGATG 


AAGCTACACA 


AATGGGTAGT 


7920 


CAATCATATA 


TTGATGCAGC 


AAAAGAATCA 


7980 


TTAACTGAAA 


ATGGATTAGA 


TAAAGGGTTC 


8040 


GACAATCATC 


ACAAATTAGC 


ACAAGAAGAA 


8100 


GTGAAGGACG 


ATCAAGAAGC 


AATTGATATA 


8160 


GGTGTATTTT 


CTCAAAATAT 


CACACGTGCA 


8220 


CGTATTTGGA 


TTAACACTTA 


CAACCAAGTA 


8280 


AAATCAGGTA 


TCGGTCGAGA AACTTATAAA 


8340 


AATATTTATA 


TTGATACAAG 


CAATGCTTTA 


8400 


TGAAGCGTGT 


TTGTAGGTCA 


GTCTAGCGGT 


8460 


TTTTAAGCAA 


AACAAAATAT 


ATAGGAACAC 


8520 


ATAATAGCAG 


TAGGATGGTT 


TTTAATTGCA 


8580 


AATTTGCGAT 


GCGGCTATTA 


TAAGGACAGA 


8640 


ATGAAGTTCA 


ATATGCAAAG 


TCATCGTTTG 


8700 


TTGCGATTTT 


TTGAACTTAA 


TGAAACTAAA 


8760 


TTATGTAACT 


AACACAATAA 


TCTCGTACAT 


8820 


GCAAAGCGCG 


AGTGTGCTGT 


AAAAGTTTTC 


8880 
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GATGATGTAT AAATCATGGT TAATTACGOA 

GAATTATTTT TAAAAOCGAC AATATTAAAT 

5 ATGAATGGOA AAAAGGCGAA TACGATAAAC 

CAAAAAATTC. AACAAAGTTC TAAAAAGACG 

TTTACAGTGA TTGAATTTGT CGGAGGTTTA 

10 

TCATTTCATA TGCTTAGTGA TGTATTAGCA 

GCAAGTAAAA AGCCGACTGC ACGATACACA 

GCATTTTTAA ATGGTTTAGC ATTAATTGTA 

15 

GTACGTATTA TTTATCCGCA ACCAATTGAA 
GGTTTACTCG TCAATATTAT TTTGACTGTT 

2Q AATATCAATA TTCAAAGTGC ATTATGGCAT 

GTCATCGTTG CAGTTGTATT GATTTACTTT 
AGTATTGTAA TTTCACTCAT CATTTTACGT 

25 tTAATTTTAA TGGAAAGTGT GCCTCAACAT 

AAAAACATAG ATGGCATATT AGATGTACAT 
CATTATTCAT TAAGTGCCCA TGTTGTGTTA 

30 GCGATTG AT C AAGTATCATC ATTGTTGAAA 
CAAATTGAAA ACTTGCAATT GAATCCATTA 
ATAAAACATT GTAGCGCCTA AAACATTAAT 

35 CTT ATGTTG C ATCATTTAAA TGATTTTCGT 
CGACATCTTT AGGTTTCAAA ATATGAATAT 

CTATGATGTA CCTTTGACCG GCCATTGTTT 

40 

TTG CTACGAC AGATTCTTTA TC CATAATG A 

. TACCCTAACA TGATTTTTAT ACTCTTTGAA 

TTAAAAAAAT ATCTTAATAT CCTTGTAATC 

45 

CATtGTTATA GGAGGTCTTA TTAATGACAT 
XTGCATCAAC GAAAGAAGAA CTAGAAGCAA 
CAACATTAAT TGAAGTACAA GCTACTGAAA 

60 

CAAATGACGA aGCAGAAGCT AAACAATTTT 



AGCATTAATA TTAACCTGAG AAG CT AT AAA 9000 

ACGACGCATT TATTTAGGAO TGGCAAACGT 9060 

AGATACAAAT ATTTTCATCA TGTCAATCAT 9120 

CTGTGGGCAT CACTAATCAT CACATTGTTA 9180 

GTATCTAATt CATTGGCATT ACTGTCAGAT 924 0 

CTTGGTTTAT CTATGTTGGC CATTTATTTT 9300 

TTTGGATATT TAAGATTTGA GAT ATT AG CT 9360 

ATTTCAATCT GGATTTTATA TGAAGCTATT 9420 

AGTGGCATTA TGTTTATOAT TGCTAGTATT 9480 

ATCCTTGTAA GGTCTTTAAA ACAAGAAGAC 9540 

TTCATGGGAG ACTTATTGAA CTCTATTGGT 9600 

ACAGGATGGC GCATCATCGA CCCAATCATT 9660 

GGTGGTTATA AAATTACGCG TAATGCgTGG 9720 

TTGGATACTG ATCAAATTAT GGCAGATATT 978 0 

GAATTTCATT TGTGGAGTAT TACAACAGAG 984 0 

GATAAAAAAT ATGAGGGTGA TGATTATCAA 9900 

GAAAAATATG GCATTGCACA TTCAACGTTG 9960 

GATG AG C CAT ACTTCGACAA ATTAACATAA 1002 0 

CTATGTCATA GGCG CACGTT TCGTTTTATA 10080 

CAATTTCTTT GATG CTATCT ACATCTAACA 1014 0 

GTTTTTCATC ATTTGTATGT AAAATGCGTT 10200 

CTACAGCAAT CTTTTTGTTT CT AG CT AAAC 10260 

TAGCCCCCTA TATATATGTT TATTTACTTA 10320 

AATATATTTT ACAGAATTTT ATCTAAATAT 10380 

CGATAAGAAT TATAGTAATA TTTTTTCAAC 10440 

TATTTTTATT AGAAGCTAAC AATCTTGATT 105 00 

AGGCAG CAT C ACTATCTACG AAGACAATTC 10560 

ATTTAACTCA TGGTTATTTT ATTGTGGAAG 10620 

TAACAGAAGC AGATATTAGT ATTCAATTAG 10680 
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ATATAACAAT TCACGATATA AGGGCTGTGT TTGGCATAGC CCTTTAGATA TACACTTAAT 11280 

TCCTATTAAA ATAGTAGGGA TTAAAAGGGG GCTTGTCATG ATTAAAATTC AACAATTACA 11340 

20 ACATCACTTT GGATCACATA AAGTAATTCA TAACTTTAAT TTGGACATTA GCAAGGGAGA 11400 

AATAGTCACT TTCATAGGGA AAAGTGGTTG CGGAAAGTCT ACTTTACTCA ATATTATCGG 114 SO 

TGGATTTATT CATCCATCGT CTGGTCGTGT CATTATTGAT AACGAAATTA AACAACAGCC 11520 

25 ATCTCCAGAT TGTTTAATGC TATTTCAACA TCATAATTTG CTGCCATGGA AAACGATTAA 11580 

TGACAACATT AGGATTGGAT TACAACAGAA AATTAGTGAT GAAGAGATTA ACGCACAGCT 11640 

TAAATTAGTT GATTTAGAAG ACAGGGOAAA GCATTTTCCC GAGCAACTGT CCGGGGGTAT 11700 

50 GAAACAACGT GTGGCACTAT GTCGAGCGCA TGTGCATAAG CCTAACGtTA TATTGATGGA 11760 

TGAGCCATTA GGTGCATTAG ATGCATTTAC ACGTTATAAA CTTCAGGATC AACTAGTGCA 11620 

aCTAAAACAT AAAACGCAAT CAACTATTAT TTTAGTGACG CATGACATTG ATGAAGCTAT 11880 

35 TTATCTTTCC GACCGCATTG TTCTGTTAGG TGAAGGGTGC AATATTATTT CTCAATATGA 11940 

AATTACAGCA TCACATCCAC GCAGTCGTAA TGATAGCCAC CTACTTAAGA TTCGTAATGA 12000 

AATTATGGAA ACATTTGCAT TGAATCATCA TCAAGTTGAA CCTGAATATT ATTTATAAGG 12060 

40 

AGTGAGTGAC GATGAAAAGG TTAAGCATAA TCGTCATCAT TGGAATCTTT ATAATTACAG 12120 

GATGTGATTG GCAAAGGACG TCTAAAGAAC GGTCTAAAAA TGCCCAAAAT CAGCAAGTGA 12180 

TTAAAATTGG ATATTTGCCG ATTACACATT CAGCTAATTT GATGATGACT AAAAAATTAT 12240 

45 

TATCACAATA CAATCATCCG AAATATAAAC TAGAATTAGT TAAATTCAAT AATTGGCCAG 12300 

ATTTAATGGA CGCATTAAAC AGTGGTCGTA TTGATGGTGC ATCAACTTTA ATAGAGCTAG 12360 

CGATGAAATC AAAACAGAAG GGCTCAAATA TAAAGGCTGT GGCATTGGGC CATCATGAAG 12420 

60 

GCAATGTCAT TATGGGACAA AAAGGTATGC ACTTAAATGA ATTTAATAAT AATGGCGATG 12480 

55 
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10 



15 



20 



25 



30 



35 



45 



SO 



GTAAACAATT AAAGATTAAA CCGGGGCATT TTAGCTATCA TQAAATGTCG CCAGCAGAAA 12600 

TGCCAGCCGC ATTGAGTGAA CACAGAATTA CAGGGTATTC TGTAGCCOAA CCATTCGGTG 12660 

CACTGGGTGA AAAGTTAGGC AAAGGTAAGA CTTTGAAACA TGGTGATGAC GTTATACCTG 12720 

ATGCGTATTG CTGTGTGCTA GTACTGAGAG GGGAATTGCT TGATCAACAC AAGGATGTAG 12780 

CGCAAgCATT TGTACAAGAT TATAAAAAGT CTGGCTTTAA AATGAATGAT CGCAAGCAAA 12840 

GTGTAGACAT TATGACG CAT CATTTTAAAC AAAGTCGTGA CGTTTTAACA CAGTCAGCGG 12900 

CATGGACATC CTATGGTGAT TTAACAATTA AGCCATCCGG CTATCAAGAA ATTACGACAT 12960 

TGGTAAAACA ACATCATTTG TTTAATCCAC CTGCATATGA TGACTTTGTT GAACCGTCAT 13020 

TGTATAAGGA GGCATCGCGT TCATGACACG TCCCACAAAT AACAAATTTA TATTACCTAT 13080 

TATCACATTT ATTATTTTCT TAGGCATTTG GGAAATGGTC ATTATTATTG GGCATTACCA 13140 

ACCTGTATTG TTACCGGGTC CTGCTCTTGT AGGAAAAAGT ATATGGTCTT TCATTGTTAC 13200 

TGGAGAAATT TTCCAACATT TAGCAATTAG TTTATGGAGA TTTGTAGCGG GCTTTGTTGT 13260 

CGCATTGTTG GTTGCTATTC CATTGGGCTT CTTGCTTGGA AGGAATCGTT GGCTATACAA 13320 

CGCTATCGAA CCGCTATTTC AATTGATTAG GCCGATATCT CCGATAGCAT GGGCACCATT 13360 

TGTTGTTCTA TGGTTTGGTA TTGGTAGTTT GCCAGCGATT GCGATTATTT TTATCGCTGC 1344 0 

TTTTITCCCA ATTG TGTTCA ATACTATTAA AGGCGTTAGA GACATTGAAC CTCAATATTT 13500 

AAAAATAGCA GCAAATTTAA ATTTAACTGG GTGGTCATTG TATCGCAATA TATTATTTCC 13560 

CGGGGCATTT AAACAAATCA TGGCTGGGAT ACATATGGCG GTAGGAAGAA GTTGGATATT 13620 

TTTAGTTTCT GGTGAAATGA TTGGTG CACA ATCGGGATTA GGTTTTTTAA TCGTTGATGC 136 8 0 

ACGAAATATG TTGAACTTAG AAGATGTTTT AGCAGCAATA TTCTTTATCG GATTATTTGG 13740 

TTTTATTATT GATCGATTCA TTAGTTATAT TGAGCAGTTT ATACTTAGAA GATTTGGTGA 13800 

ATAAGGAGAG ATGATGATGA CTTTAGAAAC GCTTATCAAA GAACAATTAG ATCCTCATTT 13 8 60 

AGTAGAAGTT GATGAAGGGA CGTATTATCC GAGAACATTT ATTCAGCAAT TATTTGTAGA 13920 

TGGTTATTTC GGTGAGGCGG CATTGAGAAA AAATGCTGAA GTAATCGAAG CTGTATCGCA 13 980 

GTCTTGTTTG ACAACAGGAT TTTGTTTATG GTGCCAATTA GCTTTTTCAA CGTATTTAGA 14040 

AAATGCCACG CAGCCACATT TAAATAATGA CTTACAACAG CAATTGTTAT CTGGAGAAAT 14100 

ATTAGGTGCT ACCGGATTGT CTAATCCGAT GAAGT CATTT AATGATTTAG AAAAGTTGAA 14160 

CCTTGAACAC ACTTATGTTG ATGGACAATT GGTTGTCAGT GGACGTATGC CAGCTGTAAG 14220 

TAATATTCAA GAAGACCATT ATTTTGGTGC GATTTCGAAA CATGAATCAT CAGATGAATT 14280 
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10 



is 



20 



2S 



30 



35 



40 



45 



SO 



TTTAGGAGTC 
ATCACAAATT 
TGCTTACCAA 
ATTTTCAAAT 
AAAACGTTAT 
TTCACATTTA 
AAATCAAGCT 
TCGCAAGTTA 
TAAACTTGAA 
TTGTTGAAAC 
GTTATATCCT 
AAAAGTGTTA 
GAAGGTGAAT 
TTATCGTAAA 
TTGGGCGATT 
CAAACGCAAT 
AGAAATAGAT 
AGATATTGAT 
ATAAGTGTTG 
ATAATTCAAG 
CAACATGCAA 
AAATCGTAGC 
AAAAAGCATT 
CAGATAATGG 
CATTTTATCC 
CAATTACAGC 
TTGGTGTCGT 
ATTTATTAAT 
GGAACTTGTC 



AACGGGTCAG 
ATCACGCATG 
ATTCCAATAG 
GTGCAAAACG 
CGTCAACTTA 
AATGAATTAA 
TCTGTTGTCA 
AAAGAAGGAT 
GCAGAGTTGA 
ATTTTTTAAA 
TTTAACTAGG 
ATAAGGTGTA 
ACTATAGATA 
GAATACGTGG 
GCACATATGA 
GAATCGATGC 
GTAAATGAGA 
TTGAATGATG 
CTGGTGTAAG 
GGGGTGGTAT 
CACGTACTTT 
ATTATTT CCA 
AGGATTAAAA 
TGAAGACTTA 
TGCATATATG 
AGGTGTAGGA 
TGAAGTTACA 
ACTTCTTAGA 
TCAAGTAGGT 



CAACGTATCA 
ATGCGAAGCA 
GATTAGGCTC 
GAATAAATCA 
GAGAGGAATA 
TATCATTGAA 
ATGGTGGTTC 
TCTTCTTCGC 
AGGGGTAAGT 
ATAATATAAA 
AAAATATACA 
TAATGAAAAT 
CGCATACTAA 
GTAAAGGACC 
CAGGTGTTTT 
TCCATTATAC 
TGGAAAGTCT 
ATGAAGTCAT 
GTACACGGTG 
GTCAAACGGT 
AAGGAAGTCA 
GAAGCAGTAG 
ACATTTTTAG 
GATAAACATT 
ACTCGTGAAC 
TCTGACCATG 
GGAAGTAATA 
AACTATGAAG 
AATCATGCGC 



AATCACATTG 
GTTTGCGGCA 
AATTAAAAGT 
ATATTTAGAG 
TTATGCAATA 
GAAGGACATC 
TAGAGCGTAC 
AGCATTGACA 
GTGATAAGCT 
TCTTAGTTTA 
TTTCGTAATA 
GTGAACAATT 
AGAACAACAA 
CAATAGTATT 
GAGTAAAGTT 
ACGCACAGAG 
TGTAGGCGCT 
TTCAATATTT 
CTGTTTGCTA 
GCCGTTTTTT 
AAATTTATCA 
AAGGTCAAGA 
AGGAAAGAGG 
TACCAGATAT 
GTATTGAAAA 
TAGATTTAGC 
CAGTTAGTGT 
AAGGTCATCG 
ATGAATTACA 



AATCAAGTCG 
ACTATTCGCC 
TCTTTAGAGT 
TATGATGTTG 
TTAGATGACG 
GGCTATTTAT 
ACACCATATT 
CCGACATTAA 
GATTTTTTGT 
TAAACATTTT 
ATAATAATCG 
AATGAACTTC 
TTCTCGAATC 
CGAGTGTCGT 
GAGAGTTTTT 
AAGATTAAAC 
AAGTTTGTAA 
GTTTT CG AT A 
ACTTCGCTTT 
TGTCATATTT 
TTTAGGAGAG 
AAATCAATTA 
ACATGAGTTC 
GGATGTGATT 
AGCACCGAAC 
GGCAGCAAGT 
GGCAGAACAT 
TCAATCAGTA 
ACACAAAACA 



TAGTGCCACA 
CGCAATTTAT 
TAATTGATGC 
AAGCTTTTAA 
GTAACTTAAC 
TGTTAGATGT 
CGCCACAAGT 
GACATTTAGG 
TTAGATGCGT 
CTGTTAATTT 
TTATCATTGA 
TTATTTTAAA 
TAGTAAGATC 
TTAAAGATAA 
ACCTAAACGA 
AGATGTATAA 
AATTATTTAC 
AGTCAATAGA 
GAATTTAACA 
TTAAAACAAG 
ATGGATATGA 
CTTAATACTA 
ATTATATTAG 
ATTAGTGCGC 
TTGAAATTAG 
GAACACAATA 
GCGGTTATGG 
GAAGGTGAAT 
ATTGGTATTT 



14400 
14460 
14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 
15840 
15900 
15960 
16020 
16080 
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6 



10 



16 



20 



TACAACACTA 


TGATCCAATC 


AATCAACAAG 


ACCATAAATT 


GTCTAAATTT 


GTAAGCTTTG 




ATGAACTTGT 


TTCAACAAGT 


GATGCGATTA 


CAATTCATGC 


ACCATTAACA 


CCAGAAACTG 


16260 


ATAACTTATT 


TGATAAAGAT 


GTTTTAAGTC 


GTATGAAAAA 


ACACAGTTAT 


TTAGTGAATA 




CTGCACGTGG 


TAAAATTGTA 


AATCGCGATG CGTTAGTTGA AGCGTTAgCA TCCGAGCATT 


1 on 


TACAAGGATA 


TGCTGGTGAT 


GTTTGGTATC 


CaCAACCtGC 


ACCTGCTGAT 


CATCCATGGA 


^ C A A f\ 


GAACAATGCC 


TAGAAATGCT 


ATGACGGTTC 


ACTATTCAGG 


TATGACTTTA 


GAAGCACAAA 


16500 


AACGTATTGA 


AGATGGAGTT 


AAAGATATTT 


TAGAGOGTTT 


CTTCAATCAT 


GAACCTTTCC 




AAGATAAAGA 


TATTATTGTT 


GCAAGTGGTC 


GTATTGCTAG 


TAAAAGTTAT 


ACAGCTAAAT 


16620 


AQAATAAGGA 


TGCTGGGCTA 


GCGATTAACG 


CTTTCAATTT 


TATATAAATG 


AATCATATAA 


16680 


GCACTACTGC 


TGTTGTAAAG 


ATGGCAGTAG 


•rrrrriTATG 


ATTACAT CT A 


AGTATAGTCA 


16740 


CGGCTATGTT 


AGGACAATGA 


TTTAACATTT 


AOGCACATAT 


GTGTTCACTT 


ACGCAATTAT 


16800 


TGAnAAATnT 


CATTCATGTG 


GnAATC 








16826 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

<A) L.ENGTH: 4012 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 



£0 



TTCAATGAGA 


GTAGTGGGCT 


GATGTTTAGC 


GATATCGCGT 


AAGATTAACC 


ATTGGCCATA 


60 


ATATATATTG 


TGTTTTTCTA 


AAATCGGCTC 


GGCTAATTTT 


AAATAGGGGC 


GATATATTGT 


120 


TATAAAACTA 


TTGAAAAATT 


CTTGTGATAG 


CATAGTGACA 


TCTCCTAAGA 


CAAAATAGTT 


180 


AGCTTAGCTA 


mCCTTTTTAC 


AACAATAGTA 


ATTATAAAAC 


GGG AGCAATT 


AGAAATCAAT 


240 


ATATAATTAT 


TAAGAGCAAA 


AATAATTATA 


CTTTGTTAAA 


ATAAGCGTAA 


TTACATGTAA 


300 


ATAGGGGGAT 


ACTAATGATA 


TTGAAATTTG 


aTCACATCAT 


TCATTATATA 


GATCAGTTAG 


360 


ATCGGTTTAG 


TTTTCCAGGA 


GATGTTATAA 


AATTACATTC 


AGGTGGGTAT 


CATCATAAAT 


420 


ATGGAACATT 


CAATAAATTA 


GGTTATATCA 


ATGAAAATTA 


TATTGAGCTA 


CTAGATGTAG 


480 


AAAATAATGA 


AAAGTTGAAA 


AAGATGGCAA 


AAACGATAGA 


mGGCGGAGTC 


GCTTTTGCTA 


540 


CTCAAATTGT 


TCAAGAGAAG 


TATGAGCAAG 


GCTTTAAAAA 


TATTTGTTTG 


CGTACAAATG 


600 


ATATAGAGGC 


AGTTAAAAAT 


AAACTACAAA 


GTGAGCAGGT 


TGAAGTAGTA 


GGGCCGATTC 


660 



55 



384 



ATATCAAACA CCTCATTGTT AGATTATTGA 

20 

TTAATGTGGT TGCTTGAGGA AAAATTTATT 
TGAATATCGT GTTAGATQAT GAAAGTATAT 

2S AATTGTACGA TAACATTAAA TTTAACACGA 
ATGGGTAAAT TTGAACTTGC TAAACTATTA 
TTCAAATCTT ACACAAGCTC TGMTCGACA 

30 ATTGTTAAAT AGAAGGAGAT ATCATAAATC 
CATGACGCAT TTGTTAAATC CCACCCAAAT 
GAAACAAAGA AATTAACTGG ATGGTACGCG 

35 GTTCAGGGTG TTGCGCAGTT ACTTTTTAAA 
TATATTTCGC GTGGTTTTGT TGTTGATTAT 
GACAGTGCAA AAGAAATTGC TAAAGCTGAG 

40 GTTGAAGTTG ATAAAGGTAC AGATGCTTTG 
AAAGGATTTA AAGAAGGTTT ATCAAAAGAC 
CCAATTGATA AAAATGATGA TGAGTTATTA 

4S 

GTGCGCTTGG CTTTAAAGCG AGGTACGACA 
ACATTTGCTG AGTTAATGAA AATCACTGGG 
AGTTACTTTG AAAATATTTA TGATGCGTTG 

SO 

GTAAAGTTGG ATCCAAAAGA AAATATAGCG 



CATTATAACA GGGGTAATTG TATATGAACA 


1380 


CATTGAAGTC AAGTTGGTTC ATTTTAGAAA 


1440 


TGAAGTATAG GTAACTAGTT GAAAAGTATT 


1500 


AACATAGATA TAAAATGATT CACAATTAAA 


. 1560 


ATTGGAGCAT GGACATTTCA AAAATAAGAG 


1620 


CTATAAGATA CAAACTGTAT AATTAAAGGT 


1680 


ATGGAAAAGA TGCATATCAC TAATCAGGAA 


1740 


GGAGATTTAT TACAATTAAC GAAATGGGCA 


1800 


CGAAGAATCG CTGTAGGTCG TGACGGTGAA 


1860 


AAAGTACCTA AATTACCTTA TACGCTATGT 


1920 


AGTAATAAAG AAGCGTTAAA TGCATTGTTA 


1980 


AAAGCGTATG CAATTAAAAT CGATCCTGAT 


2040 


CAAAATTTGA AAGCGCTTGG TTTTAAACAT 


2100 


TACATCCAAC CACGTATGAC TATGATTACA 


2160 


AATAGTTTTG AACGCCGAAA TCGTTCAAAA 


2220 


GTAGAACGAT CTGATAGAGA AGGTTTAAAA 


2280 


GAACGCGATG GCTTCTTAAC GCGTGATATT 


2340 


CATGAAGATG GAGATGCTGA ACTATTTTTA 


2400 


AAAGTAAATC AAGAATTGAA TGAACTTCAT 


2460 
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CAAAATATGA TTAATGATGC GCAAAATAAA ATTGCTAAAA ATGAAGATTT AAAACGAGAC 2580 

CTAOAAGCTT TAGAAAAGGA ACATCCTGAA GGTATTTATC TTTCTGGTGC ACTATTAATG 2640 

5 

TTTGCTGGCT CAAAATCATA TTACTTATAT GGTGCGTCTT CTAATGAATT TAGAGATTTT 2700 

TTACCAAATC ATCATATGCA GTATACGATG ATGAAGTATG CACGTGAACA TGGTGCAACA 276 0 

ACTTACGATT TCGGTGGTAC AGATAATGAT CCAGATAAAG ACTCAGAACA TTATGGATTA 2 820 

10 

TGGGCATTTA AAAAAGTGTG GGGAACATAC TTAAGTGAAA AGATTGGTGA ATTTGATTAT 2880 

GTATTGAATC AGCCATTGTA CCAATTAATT GAG CAAGTTA AACCGCGTTT AACAAAAGCT 294 0 

AAAATTAAAA TATCTCGTAA ATTAAAACGA AAATAGATTA ACGACTGAAA TCTGAACGCT 3000 

15 

CATAAGACTG TCATTTGCGT TCAGATTTTT TTACACAATA TAGAATGGTT GAGTAAAATA 3060 

TTTTTGAATA TAGTGAAAGA GGGGOAAGTA CTGTGATAAA AAAGCTATTA CAATTTTCTT 3120 

2Q TAGGGAATAA GTTTG CTATC TTTTTAATGG TTGTTTTAGT TGTCTTGGGC GGTGTATATG 3180 

CGAGTGCTAA ATTGAAATTA GAATTACTAC CAAATGTACA AAATCCAGTT ATTTCAGTTA 3240 

CAACAACAAT GCCGGGTGCA ACGCCACAAA GTACCCAAGA TGAAATAAGT AGTAAAATTG 3300 

25 ACAATCAAGT AAGATCATTG GCATATGTGA AAAATGTTAA AACGCAATCC ATACAAAATG 3360 

CTTCAATTGT AACAGTTGAA TATGAAAATA ATACAGATAT GGATAAAGCA GAAGAACAGC 3420 

TTAAAAAAGA AATCGATAAA ATTAAATTTA AAGATGAAGT TGGTCAACCA GAATTAAGAC 34 80 

30 GTAATTCGAT GGATGCTTTT CCGGTTTTAG CATATTCATT TTCAAATAAA GAGAATGACT 354 0 

TGAAAAAAGT AACGAAAGTA CTGAATGAAC AATTAATACC AAAATTGCAA ACGGTAGATG 36 00 

GTGTGCAAAA TGCG CAATTA AATGGGCAGA CGAACCGTGA AATCACCCTT AAATTTAAGC 3660 

35 AAAATGAACT TGAAAAATAT GGGTTGACTG CTGATGATGT AGAAAACTAT CTAAAAACGG 3720 

CAACAAGAAC AACGCCACTT GGATTGTTCC AATTTGGTGA TAAAGATAAT CAATTGTTGT 3780 

TGATGGTCAA TATCAATCTG TTGATGCTTT TAAAAACATA AATATTCCAT TAACGTGGCA 3 840 

40 

GGAGGACCAA GGGCATCTCA TCCCAAAGTG ACCATAAACC AAATTCAGCC ATGTCAGACG 3900 

TTATCAGGCA TCACCACAGC AAATTCAAAG CGTCAGCnCC AATATATAGT GGATGCCGCA 3960 

nGAACTAGGG GTTTAGCGnT ATCAGTGGTG TGGCGACTCT ATTCTAAACG AT 4 012 

45 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7778 base pairs 

(B) TYPE: nucleic acid 
60 (C) STRANDEDNBSS : double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: £ 
CAATATAGGT CGCCGAGTTT CAACTaCATC 
5 AGGTGTAGGT GGCTATCCTA AAGGACGAAT 

TAAGACAACA GTAGCGCTTC ACGCTATTGC 
ATTTATCGAT GCTGAACATG CTTTAGATCC 

10 

CGATAATTTA TATTTATCGC AACCGGATCA 
ATTTGTTAGA AGTGGTGCAG TTGATATTGT 
TAAAGCTGAA ATTGAAGGAG AAATGGGAGA 

1S 

GTCACAAGCG TTACGTAAAC TTTCAGGTGC 
CATCAACCAA ATTCGTGAAA AAGTTGGTGT 

2Q TGGACGTGCA TTAAAATTCT ATAGTTCAGT 

TAAACAAGGA CAAGAAATTG TAGGTAATAG 
GGCACCACCA TTTAGAGTAG CTGAAGTTGA 

25 GGGTCAACTT ATTGATTTAG GTGTTGAAAA 

TTCTTACAAT GGCGAACGAA TGGGTCAAGG 
AAATCCACAA ATTAAAGAAG AAATTGATCG 

30 TGGTGATGTT GAAGAAACAG AAGATGCACC 

AAATTTATAT CTATAGTTAA ACTTAGCAAA 
TCATCTCATA AAGCTAGAAT AATATCTAAC 

35 AAGGTTTTTT ATTTTATTTA TTATTACATT 

TTTAGAAAAT AGTAGAAATA GCATTCAATA 
ACTTATCTCC TATAAACCGT ACAATTAATT 

40 

ATATTGAATT TCATATAAAG AGCAAACCCT 
AGCCTCCTAC TCATTTTGCT GGGGATCATT 
CGAAATTTGT TGCTTCAAAA GCAATCACAA 

45 

CAAGCACATA AAGAAGCTGA CAATATCAAA 
AACCAAATCC TAAGAGAACA AACTGAAGCA 
AGACAAGAAA CCCGACTTCT TCAAAAAGAA 

SO 

GATAAAAAAG ATGAGATTTT AGAGCAAAAA 

55 



ID NO: 48: 



AACTGGTTCA 


GTTACATTAG 


ATAATGOGCT 


60 


TATTGAAATT 


TATGGTCCTG 


AAAGTTCTGG 


120 


TGAAGTACAA 


AGTAATGGCG 


GGGTGGCAGC 


180 


AGAATATGCT 


CAAGCATTAG 


GCGTAGATAT 


240 


TGGTGAACAA 


GGTCTTGAAA 


TCGCCGAAGC 


300 


AfiTTfJT A c 


TfACtTTGfTFG 

X^pAwX Xw\^*V* 


CTTTAACACC 


360 


w\V> X V— nj X x 


VJV3 XXX ** V^W*M 


CTTCGTTTAAT 


420 


1 Ax L il«Xnnn 


Xl_M/v\ XA\a*n/v 


V» X X A X X X i 


480 


TATGTTCGGT 


a&TVf'ii/iAfSA 


/-*tap a rr* Attn 


540 


/ «i i-jv /*» tv 71 


V-J lALu 1 V-A^ X/\T 




600 


AAt_Ti AAAA1 I 


AAAblwl In 


Aftnni/wwtU X 


ecn 

D O w 


TATTATGTAT 


tjuAwtAUuiA 


X A iwl/vlnun 


/ * V 


CGACATCGTT 


Gel 1 AAA x UA<j 


uAuLA Iuvj 1 A 


*7Qfl 
/ 0 w 


TAAGGAAAAT 


GTTAAAATGT 


ACTTGAAAGA 


fl A n 


TAAATTGA&A 


uAAAAAi 1 ALj 


fSTATAffTYlA 




AAA<j IvJAl X.A 


X X luAUJnAv) 


A A T HfSTIk r 1 ap 


960 


TATCCTTAXA 


*^Kj>\1 1 VjM.X x\v 


nnnu X Vjnlnl 


lW4W 


<|M|M|i fv 'IV 7A T* 


APArTAOAAA 




1080 


AX l-AAXft\3 X X 


TTATAATCGA 


• GCTTCAAAAC 


1140 


TAnTfiPAAAA 


GTGCAAATTG 


ATAACTTGAC 


1200 


TGTATGATTT 


ATATATAATT 


TCATAAAGTC 


1260 


AGAAAAGGAG 


GTGTTTGTGT 


GAATTTATTA 


1320 


CTAGGAGTTG 


TTGGAGGGTA 


TGTTGTTGCC 


1380 


GCTAGACAAA 


CTGCCGAAGA 


TATTGTAAAT 


1440 


AAAGAGAAAT 


TACTTGAGGC 


AAAAGAAGAA 


1500 


GAACTACGAG 


AAAGACGTAG 


CGAACTTCAA 


1560 


GAAAACTTAG 


AGCGCAAATC 


TGATCTATTA 


1620 


GAATCAAAAA 


TTGAAGAAAA 


ACAACAACAA 


1680 
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CGCATCTCCG GTCTCACTCA AGAAGAAGCT 
GAACTGTCAC AAGATATTGC AGTACTTGTT 
5 GTTGATAAAA CAGCAAAAGA ATTATTAGCT 

ACAAGTGAAT CAACGGTATC AGTAGTTAAC 
ATTGGACGAG AAGGACGAAA CATCCGCACA 

10 

ATTGATGACA CACCAGAAGC GGTTATATTA 
GCTAGAACAG CACTTGTTAA CTTAGTATCT 
GATATGGTCG AAAAAGCTAG AAAAGAAGTA 

15 

GCTACATTTG AAGTGAACGC ACATAATATG 
TTAAACTATC GTACGAGTTA CGGTCAAAAT 

2Q CTTGCTAGTA TGTTAGCTGC TGAGCTAGGC 

CTTTTACATG ATGTTGGTAA AG CAATTG AT 
GGTGTAGAAT TAGCGAAAAA ATATGGTGAA 

25 CATCATGGTG ATGTTGAACC TACATCTATT 

TTGTCTGCGG CTCGTC CAGG TGCAAGAAAA 
GAACGTTTAG AAACGTTATC AGAAAGTTAT 

30 GCAGGTAGAG AAAT CCGAGT GATTGTATCT 

CGATTGGCTA GAGATATTAA AAATCAGATT 
AAGGTG ACAG TTGTTCGAGA GACTAGAGCA 

35 TCAGAAATTA GTGAGGGAGC TTTTTTAAGT 

ATCGGTAATA ACTATATTAA ACAGTAGTTA 
AAGAAG TT AT TGCTTTTAAT AAAAATGTTT 
GTAAACCTAT AAAGATGATT GGTTTTCTAT 
TCTTCTCTTC yG CAAT ATT A ATTAGGATTT 
TCTGTTTTCT TTAATTCTTT T AT AACTT CT 

45 

G AATATCT CT CTGCTAAACG ATATGCATTA 
TCCTCTGCAT CTTCGAATTT TGATGGGTTA 
GGATCAATAT TAATAGACAT GTATTTATTT 

50 

CTAACATATT GAAGTTTTCA GACAAAGTAA 



ATTAATGAGC AACTT CAAAG AGTAGAGGAA 1800 

AAAGAAAAAG AAAAAGAAGC TAAAGAAAAA 1860 

ACAGCAGTAC AAAGATTAGC AG CAGATCAC 1920 

TTACCTAATG ATGAGATGAA AGGTCGAATC 1980 

CTTGAAACTT TAACTGGCAT TGATTTAATT 2040 

TCTGGTTTTG ATCCAATAAG AAGAGAAATT 2100 

GATGGACGTA TTCATCCAGG TAGAATTGAA 2160 

GACGATATTA TTAGAGAAGC AGGTGAACAA 2220 

CATCCTGACT TAGTAAAAAT TGTAGGG CGT 2280 

GTACTTAAAC ATTCAATTGA AGTTGCGCAT 2340 

GAAGATGAGA CATTAGCGAA ACGAGCTGGA 2400 

CATGAAGTAG AAGGTAGTCA TGTTGAAATC 2460 

AATGAAACAG TTATTAATGC AATCCATTCT 2520 

ATATCTATCC TTGTTGCTGC TGCAGATGCA 2580 

GAAACATTAG AGAATTATAT TCGTCGATTA 264 0 

GATGGTGTAG AAAAAGCATT TGCGATTCAG 2700 

CCTGAAGAAA TTGATGATTT AAAATCTTAT 2760 

GAAGATGAAT TACAATATCC TGGTCATATC 2820 

GTAGAATATG CGAAATAATT TTTGTCTCCC 2 880 

TGTAGTCTTA At CTAGTTAG AGAGCACTTT 2 94 0 

TTTGAAAGTA AGACGGACCT TATATTAAAT 3 000 

TAGGCTTCGT AATTACTATA TTTATATTAT 3060 

CCAATAAAAA AGAAGAGAAG ATGTAACACA 3120 

ATTTCTAAGT TGAGTTATTT TAATTGTAAA 3180 

GCAGTATCAT AACAATTTGT TGCAATTGTT 3240 

ATGTAAAGCT TTAAACTTTC TTT AG CT AT A 3300 

GACATAACCA CTAATTCTGC AAATTTTTCT 3360 

ACAACTCCTA TTTATTTTGA TGTCTTAATA 3420 

TGTCTCTCTA TAATTGAAGA AAAATAATTC 3480 
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GGATGAACAA AACATGAGAA TAATGTTTAT 
CGCAATTGAA ACGTACATAC CTCAACTGAA 
5 AAATGCTGAA AATGCAGCAC ATGGTAAAGG 

AAGAAATGGT GTAGATTTCA TGACTATGGG 
TGATTTTATA GATGAAGCAA AACGACTAGT 

10 

OGGAATTGGT ATGAGATTTA TACAAATTAA 
AGGAAGAGCG TTTATGCCAG ATATTGATGA 
GGAAGCACAA GAACAAACTC CGTTTATATT 

1$ 

AAAGTATGCA ATGGGATGGC ATTTAGATGG 
CACATTCAAA CAGCAGATGA ACGTATTTTA 
GGTATGACAG GTTTTTATGA TGGCATTTTA 

20 

TTTATCACTA GTTTGCCACA AAGACATGTT 
GGTGTTGTTA TTGATTTAGA CAAAGAAGGT 

2S AATGATGACC ATCCATTTTC AACATTTTAA 

CTATCGTCCA TTAGTATGAA TTTAATATAG 
CTTTTTGTTA TCATTTAATA TGAAATATAT 

30 ATTATCGTGG AAAGTTGGCG GTCAACAAGG 

CGCTACGGCT ATGAATAGAA AAGGATATTA 
TATCAAAGGT GGACATACGA ATAATAAAAT 

55 TAGTGATGAT TTAGATATTT TGATTGCATT 

TGAAATGAGA GAAGACAGTA TTATTTTArC 
AGGATGTCAT GCACAGCTTA TTGAATTACC 

40 

AGCATTAATG AAAAACATGG TTGCAATAGG 
AAATACATTT GAAGAACTTA TTACTAATAT 
AGTCAATATC CAAGCATTAA ACGAAGGTTA 

45 

CTACGGGGAC TTTGAATTAG AGTCAACAGA 
CGATGCCATT GGATTAGGTG CAATTGCTGC 
TACACCTGCG TCTGAAGTTA TGGAATATAT 

SO 

GGTTATTCAA ACAGAAGATG AAATTGCTGC 



AGGGGATATC GTAGGTAAAA TTGGACGAGA 3600 

GCAAAAGTAT AAACCAACAG TTACAATTGT 3660 

TTTGACTGAA AAAATATATA AACAATTACT 3720 

TAATCACACA TATGGTCAAC GTGAAATTTA 3780 

AAGACCAGCG AATTTTCCGG ATGAAGCGCC 3840 

TGATATTAAA CTTGCAGTTA TTAATCTGCA 3900 

TCCTTTTAAA AAGGCAGATC AATTAGTCAA 3960 

TGTTGATTTT CATGCAGAAA CAACTTCTGA 4020 

TAGAsTAGCG CTGTTGTTGG AACGCATACA 4080 

CCAAAGGGGA CAGGGTATAT AACGGATGTT 4140 

GGAATAAATA AAACAGAGGT AATTGAGCGT 4200 

GTTCCAAATG AAGGTAGAAG TGTATTATCT 4260 

AAAACAAAGC ACATCGAACG TATATTGATA 4320 

AATTACGTAA GTAAACATTC GAATTGGACC 4380 

TACCACTGTT TACATAGTAA ATCGGTGGTT 4440 

CCATAGGAGG CATATAACTA TGAAACCACA 450*0 

CGAAGGTATT GAATCAACTG GGGAAATCTT 4560 

TTTATATGGA TATAGACATT TTTCAAGTCG 4620 

TAGAGTTTCT ACGACGCCTG TTCATGCAAT 4680 

TGACCAAGAA ACAATTGATG TTAACCATCA 4740 

TGATGCCAAG GCTAAACCTG TGAAaCCAGA 4800 

TTTTACAGCA ACCGCTAAAG AATTAGGTAC 4860 

TGCTACTAGC GCATTGATGA ATTTGAATAC 4 920 

GTTTTCTAAA AAAGGTGACA AGGTAGTTGA 4980 

TCAATTAATG CAATCTCGCT TACCTGAAAT 5040 

TGCACTACCA CATCTATATA TGATTGGTAA 5100 

AGGTTCACAA TTTATGGCGG CATATCCTAT 5160 

GATTGCCAAT ATATCTAAAG TAAACGGAGC 5220 

TGTAACTATG GCTATTGGTG CAAATTATGG 5280 
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TGGATTATCT GGTATGACTG AAACGCCATT AGTCATTATT AATACCCAAC GAGGTGGACC 5400 

TTCTACTGGA TTACCTACGA AACAAGAACA GTCAGATTTA ATGCAAATGA TTTATGGTAC 5460 

£ 

ACATGGTGAT ATTCCAAAAA TTGTTGTAGC ACCAACAGAT GCAGAAGATG CATTTTATTT 5520 

AACTATGGAA GCATTTAATT TAGCAGAACA ATATCAATGC CCTGTTATAG TTCTAAGTGA 5580 

10 TTTGCAATTA TCTTTAGGTA AACAAACTGT TGAAAAATTA GATTATAATC GTATTGAAAT 564 0 

TAAACGTGGT GAAATCATTC AATCTGATAT TGAACGTGAA GAAGATGATA AAGGTTATTT 5700 

CAAGCGTTAT GCGTtAACAT CCGATGGTGT TTCTCCTAGA CCTATCCCCG GTGTTAAAGG 5760 

15 AGGTATTCAT CATATAACTG GTGTGGAaCa CAATGAAGAA GGTAAACCTA GTGAATCTGC 5820 

GTCAAATAGA CAACAACAAA TGGAAAAACG AATGCGTAAA ATTGAGCAGT TACTAATTGA 5880 

ATCGCCAGTA GAAGCTAACT TACAACATGA GGATGCAGAT ATTCTTTATA TCGGTTTTAT 5940 

20 

TTCTACAAAA GGTGCAATTC AAGAAGGT AG TAACCGTTTG AATCAACAAG GCATAAAAGT 6000 

TAACACTATA CAAATTAGAC AATTGCATCC ATTCCCAACA AGCGTTATTC AAGATGCAGT 6060 

2& TAATAAAGCG AAGAAAGTCG TTGTAGTGGA GCACAATTAT CAAGGACAAT TGGCTAGTAT 6120 

TATAAAAATG AATGTCAATA TTCATGATAA GATTGAAAAT TATACAAAGT ATGATGGGAC 6180 

ACCTTTCCTA CCACATGAAA TCGAAGAAAA AGGCAAAATA ATTGCTACTG AAATAAAGGA 6240 

30 GATGGTATAG ATGGCGACAT TTAAAGATTT TAGAAATAAT GTTAAGCCTA ACTGGTGCCC 63 00 

CGGATGTGGC GATTTCTCAG TACAAGCTGC AATTCAAAAA GCAGCCGCAA ATATAGGGTT 63 60 

AGAACCTGAA GAAGTAGCTA TCATCACCGG TATAGGATGT TCTGGCCGTC TTTCAGGATA 6420 

35 

TATTAATTCT TATGGCGTTC ATTCTATTCA CGGACGTGCA TTACCTTTAG CTCAAGGTGT 64 80 

AAAAATGGCG AATAAAGATT TAACTGTTAT TGCATCGGGA GGAGATGGTG ATGGTTATGC 6540 

TATASGTATG GGGCATACAA TCCATGCTTT AAGAAGAAAT ATGAACATGA CGTATATAGT 6600 

40 

CATGGATAAT CAAATTTATG GTTTGACAAA GGGACAAACA TCGCCGTCAT CAGCAGTAGG 6660 

ATTTGTTACT AAAACAACGC CAAAAGGTAA TATAGAAAAA AATGTTGCGC CTTTAGAATT 6720 

45 AGTATTATCA TCTGGTGCCA CATTTGTAGC CCAAGGTTTT TCAAGCGATA TTAAAGGATT 6780 

AACAAAACTA ATTGAAGATG cAATTAATCA TGATGGATTT TCATTCGTTA ATGTCTTTTC 6840 

ACCATGTGTG ACTTATAATA AAATTAACAC ATACGATTGG TTTaAAGAAC ATTTAACAAG 6900 

50 

TGTTGATGAc ATTGAAAATT ATGATTCTAC AGATAAACAA TTAGCGACTA AAACTGTTAT 6960 

TGAACATGAA TCT TT AGTAA CTGGTATTGT TTATCaAGAT AAAGAAACAC CATCATATGA 7020 

ATCtCAAATT AAAGAGTTAG ATGATmCACC ACTTGCTAAA AGAGATATCa AAATTaCTGA 7080 
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TGTATTTATA ACAGATCCAT TTATGCTACT CAGTTTTTTA CTATTACAAA AAATAAAGGA 7200 

GTTTTTAAAA ATGAAAGACA CATTAATGAG TATACAAATA ATTCCTAAAA CACCAAACAA 7260 

5 

TGACAATGTT ATACCTTACG TAGACGAGGC GATTAAAATA ATTGACGAAT CTGGTTTGCA 7320 

TTTTAGAGTA GGTCCGTTAG AAACGACAGT ACAAGGAAAT ATGAATGAAT GTTTAATTTT 73 BO 

AATACAATCA TTAAATGAAC GAATGGTGGA ACTTGAATGT CCAAGTATTA TTAGCCAAGT 7440 

10 

TAAGTTTTAT CATGTGCCAG ATGGCATCAC TATTGAAACT TTAACTGAAA AATATGATGA 7500 

ATAACATTAA AAGTGAAGTA AACTGGATTT GAATTGGCTT GTTAGAGATG ACGTATAACT 7560 

1S TTAACTGTTT TTGCACTTTA TAGTTAAATT TAATATAATT ATTAAATGAT ACGGGCAAAT 7620 

AGAAAGGATT TTGTAAAGTG AACGAAGAAC AAAGAAAAGC AAGTTCTGTA GATGTTTTAG 7680 

CTGAGAGAGA TAAGAAAGCA GAAAAAGATT ATAGTAAATA TTTTGAACAT GTTTATCAGC 7740 

20 

CGCCTAATTT AAAAGCAAGC GCAAAAAAAG AGGTnAAA 7778 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 1128 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
AGATGAAGTT GTTACgAAAA TTGCGTACGC TGTTTCAGAA CATGTCAAAA TAGAAACAGG 60 
35 TAATCCATTC TTTCAAACAT CACATAGTGG TTGTGCGACG GGCGGATCCT GTAATTGTTC 120 

ATTATAAAAA ACATCGAGTC AGAAAAAGGT GGTTATTGAA cCACTAACTA GCATCTGACT 180 
CGATGTTTTT ATTTATTCGG GATTGTTTGT TTGAATTGTT GTGCTAAATC TGGTCGATCT 240 

40 

GTCACAATCG TGTGTGCACC TTTTTGGTAT AAATCATTCA TCAGATTTAT ACTATTTACG 300 
CCATAATAGC CTGGAATGAT ATTCATATCA TTTAACCATT TGATAAAACG AGATGAAGTC 360 

45 AAATCAATGC CTTTAAAATG AGTAGGCATT TGGAACGTTT GTGCTAATGG TTGGTAGTAC 420 
CTACCACCTA ATAAATGATA TTTTAAAAAT GCTTCTGTAA CTTCCTGTTG GCTAGCACCA 480 
ATTGCGACGG ATCCTTGTGC AATTTTATTA AAACGAACGA TTTGTTCTTT ATAAAAACTT 540 

60 GTCACAAGAA CGCGGTCAAA TGCTTGATTT TCTGCAATTG TATCAAACAT AATTTGTGGT 600 
GCGATTGAGC CTTCATAGGA TTCAGGAGCA TCTTTTAAGT CTACGTTTAT ATACATATCA 660 
GGATATTGCT TCAGCAACTc ATCGAAGGTT AGTATAGCTG TGTGTGCATG ACCACGATAT 720 
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AATGTATGGG CACTAACTTT TCCAGAGCCG TTCQTCGTTC TATCAACAGT TGCGTCATGA 840 

AAAACGATAA GCTGTTGATC TTTTGTGAGT CTCACATCTG TTTCAAAGCC ATCAACGCCT 900 

6 

AATTGTTTAG CATAGTCAAA TGCAAGTTGC GTTTGCTCTG GTCTTAAAGC CATACCACCG 960 

CGATGCGCAA ATATATATGG TGCATTGCCT TTGAAAAAAG CAGGGATCGT TTGCTTTTTA 1020 

10 GTAATCACTT TATTTTTATT GATCATTAAT AGACTACTTA AAAATCCAGC ACCGACTAGT 1080 

ACCGCATTTA AAATGTTTCT GTTTACnTTT TTCATAAAAA ATTCCTCC 1128 
(2) INFORMATION FOR SEQ ID NO: 50: 

« (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6252 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 



25 


CAAGCAAACA 


ATCGTCGATA 


AAATTGCTAA 


AATAATAAAA 


GTAATTCGAA 


CTTTCATCAT 


60 




GAT CATC CTT 


TGTTTATAGA 


GTCAATATAA 


GTATGGAATA 


TGTTAGGTAT 


ATAGTCAAAT 


120 




GCGTCAACTA 


ATGGGAATTT 


TGGCATAGAT 


AGAGAATTTA 


AGGCAATTAA 


AAAGGCATCA 


180 


30 


AACAGTAATA 


TGCTGCTTGA 


TGCCCAAATG 


ATGACTTTAG 


CTAAATTGAT 


TAGTCACTTT 


240 




TAAAGATAAA 


GAATTGTCAT 


GAATTAAAAC 


TCATGTAATG 


ATGTGTTACA 


TTTCGCAATG 


300 




ATGGCTTTCA 


GTTATTTATC 


GATAACATCA 


CTCTTGATAC 


CTTTAGATTT 


TAAGAAATCT 


360 


35 


TTAATTTTAT 


CTTGTTG CTT 


TTTATTAACA 


TCACCGGCAT 


ATTTTGTTGG 


CACGTCGACA 


420 




ACATTGATTT 


TATTTTGCGG 


TTGATAGCTA 


AGCTTTTCAA 


TATCTTCATC 


AACATTGGCG 


480 


40 


ATTGTACTAT 


TTAAAGCTTT 


GAAGTAATTC 


ATCATTAATT 


CAACGGGTTT 


CTTATATTCT 


540 


TTAGGAATAT 


TGTTTTCAGT 


GACAAATTTC 


TTGAAATGCA 


AATCGTTTTT 


AACAGCTAAG 


600 




TTAGATAAGT 


GGCTAAGTGT 


TTCTGCTTGT 


TTTTCAGTCA 


CTTTTGTTTG 


ACTGTCAATT 


660 


45 


TGTTTATCTA 


GTTTATGTTG 


CATAATATAT 


TTGTTATCAA 


GTATATCGCT 


ATTTACAGAC 


720 




AAATACTTTT 


CTATAGCTTG 


CTTCATCTCT 


GCATCACTAA 


TATCACTATT 


TTTCTTATCT 


780 




GAGTTAAAGA 


TATCTTTTGT 


tTCTAATTTT 


TTAGCGCTTT 


TAGGTGCATG 


GATGCCAGTA 


840 


60 


CTTGTATGAT 


GATCTTCGTT 


ATCAGATTGA 


TCGGACGCGC 


AACCTGTAAG 


AATTAATGTC 


900 




GATGCTAAAA 


ATGTACTTAG 


TAGTAATCTC 


TTTTTCATAA 


TGTAATATAA 


CTCCTTAGTT 


960 




TATCTTTAAT 


TGAAAAAATA 


TGTATTCATG 


TTTAATAGAG 


TAACATTGAA 


TTAGTTTGGA 


1020 
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TCTATCAATA ATGCATCATT TTGGACGTTG TTAAGGATAG CTTTATCTAT AAATAACTGC 114 0 

ATAATTGGTT GTACTAATTT AGACGTAGGT ATCGTACGTA AAAGCATAAT AATTTCGTTC 1200 

ACATACTTTT CTTTCTCAAT ATCATTTTTC ATATTGATTT GTTTGCGAGA GGTACATACT 1260 

TTAAGCATTA TCGCACATCT CGTTGTATAT ATTAAGTTTA TCATAACATG ATTTTATGTC 1320 

GGGATAAAAA AATAACAGCA TCTTAACAAA TGTAAGATAC TGTCAGTGAA ATGAATOAAA 1380 

CTTTAGTTTC TGaTAATATA GTCAAAGGCA TTTAATGCTG CATTTGCACC AGCGCCCATT 1440 

GAAATGATAA TTTGTTTGTT CTTCTGATCT GTGACATCGC CAGCAGCAAA TATTCCAGGA 1500 

15 ACATTCGTAT TATTGTTACG ATCAATCACA ATTTCACCAC GTTCGTTTAA TTCAACAGCA 1560 

TCGTTTAACC ATGATGTGTT TGGAAGTAAA CCAATTTGAA CAAAGATACC ATCTAAGTTA 1620 

AGTAGATGTT CTTCGCCGGT GTTCATGTCT TCGTAACGTA TACCTGTAAC ATGGTCTTCT 1680 

CCGACAACTT CAGTAGTTTT GGCATTTGTT TTGATATCAA CATTTGATAA AGAACGTAAA 174 0 

CGATCTTGTA ACAOGTTGTC TGCTTTTAAT TCGCTAGCGA ATTCGAATAA TGTAACATGA 1800 

TTAACGATAC CAGCAAGGTC AATTGCTGCT TCAACCCCAG AGTTACCGCC ACCGATAACT 1860 

GCTACGTCTT TATTTTCAAA TAGAGGTCCG TCACAGTGAG GGCAGAATGC AACACCTTTA 1920 

TTAATCAATT GCTCTTCACC TGGAATGTTT AGCTTACGCC AACCTGCACC AGTAGCAATA I960 

30 ATGACTGTTT TACTTTCTAA GACAGCACCG TTTTCTAACG TAACTTTAAT TGCTTCGTCA 2040 

GTCTTTTCGA TATCTGTAGC ACGTATACCT GTCATTGCAT CAATGTCATA TTGATCAATG 2100 

TGCGCTGCTA AGTTAGAAGA AAATTCAGAA CCAGTTGTTT CTTTAACAGT AATGAAGTTC 2160 

TCAATACCAG CAGTATCATT AACTTGGCCA CCGATACGAT CAGCAACTAT ACCAGTACGT 2220 

AAACCTTTAC GTGCTGTGTA AATCGCTGCA CTACCACTAG CAGGACCACC ACCAACGATT 2280 

AAGACATCAT AAGGTTCTTT ATTTTCAAAC TCAGATGCAT CTGCCGTACT GCCTAGTTTC 2340 

GAAAGAATAT CTTGGATTGT CATACGACCA TTGCCAAATT CTTCGCCATT TAAAAAGACA 2400 

GCAGGGACTG CCATGATGTT TTCAGATTCT TCACGGAACA CTGCACCATC AATCATAGAA 2460 

45 TGCGTGATGT TAGGGTTGAT CACACTCATT AAGTTAAGTG CTTGAACGAC ATCAGGACAT 2520 

TTTTGACACG TTAAACTAAT GAATGTTTCA AAATGGAATG AACCTTCTAA TTTTTTAATT 2580 

TGGTCAATGA TTGACTGTTT TTCTTTAGGT GCACGACCAC TAACCTGTAA AATTGCTAAA 2640 

50 ACAAGTGAGT TAAACTCGTG ACCTAATGGA ATACCTGCAA ATGTTACACC TGTTTCTTCG 2700 

CCAGGACGAT TGACTGAGAA ACTTGGTGTA CGTTTTAAAG ATTTTTCAGA AAGAGATAGT 2760 

CTAGGTGACA TATCAGTAAT TTCTGTCAAC AAATCTTTAA GTTCTTTGGA TTTATCATCT 2820 
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TGTTGTTTTA AATCAGCATT AAOCATGGTT GTAATGCCTC CTTAGATTTT AC CTACTAAA 2540 

TCTAAACCAG GTTGCAATGT TTTAGCGCCT TCTTCCCATT TAGCTGGGCA TACTTCGCCA 3000 

5 

GGGTTTTTAC GAACATATTG AGCTGCTTTG ATTTTGTGAG CTAATGTACT AGCGTCACGG 3060 

CCAATTCCGT CAGCGTTAAT TTCAGATGCT TGTACAACAC CGTCTGGGTC GATAATGAAT 3120 

1Q GTACCACGTT GAGCTAAACC AGTAGCTTCA TCTAATACAT CAAAATTACG AGTGATTGTT 3 ISO 

TGTGATGGGT CACCAATCAT AGTGTAAGTG ATTTTGCTAA TTGCATCTGA ATGGTCATGC 3240 

CATGCTTTGT GTACGAAGTG AGTATCAGTT GATACTGAGA ATACATTTAC GCCTAATTTT 3300 

15 TGTAATTCTT CATATTGGTT TTGTAAGTCT TCTAATTCAG TTGGACAAAC GAATGAGAAG 3360 

TCAGCAGGAT AGAAGCATAC TACGCTCCAA GAACCTTTTA AATCTTCTTG TGTAACTTCT 3420 

TTAAATTGAT CTTTTTTTGG ATCGAAArCT TGCGCTGTAA ATGGTAAGAT TTCTTTGTTA 3480 

20 

ATTAATGACA TAAATATCTT CCTCCTAAGA ATTTAAGTAT GAATTAGAAC TATGAATTGA 3540 

TTGCGCTTAA TTATAATAAT TCTAATCTCT TAGTTAGCAT TATTACATTT TGATCCAGAA 3600 

TAGTCAACTG GATAACTTTG TAAAGTGAAT GATTACTTTT AAAATAAAGA AAGATAATAT 3660 

25 

AAAGTGCTTT GATAATGGAT TTTGTAGTTG ATGATTTAAA AGGTTGTGTC TATATTTAAT 3720 

ATCTTGATTT TAATGTAAAA AATGTAAAAA AAGAAGATTT GTATTCTCAA CTAAGTCAAC 3780 

30 CTTATTGATA ATGGTATGAG AATATTTGTT CGAGATGGAT GAAGGTAATG AGTGAGAAAC 3 840 

TGGATTTTTA AAGTATGAGA CAATATTTTA AAAAGTTCAA TTATTAACTT ATAAGCAAAT 3 900 

AATTGCTATA AAAAAGTTTG GACGTGTACA ATTGCAATAt GAAGATTTTA AATTAATTGT 3960 

35 AAAGTATCGA GGAGTGGGTA ACGTGTCAGA ACATGTATAT AATCTTGTGA AAAAGCATCA 4020 

TTCTGTTAGA AAATTTAAGA ATAAACCTTT AAGTGAAGAC GTTGTTAAGA AATTGGTAGA 4080 

AGCTCGACAA AGCGCTTCGA CGTCAAGTTT CCTGCAAGCA TACTCAATTA TTGGTATCGA 4140 

40 

CGATGAGAAG ATTAAAGAAA ATTTACGAGA AGTTTCTGGA CAACCTTATG TTGTAGAAAA 4200 

TGGCTATTTA TTCGTCTTTG TTATTGATTA TTATCGTCAT CATTTAGTTG ATCAACATGC 4 260 

45 TGAAACTGAT ATGGAAAATG CATATGGTTC AACGGAAGGT TTGCTAGTAG GTGCAATCGA 4 320 

TGCAGCATTA GTTGCCGAAA ATATTGCGGT AACTGCTGAA GATATGGGGT ATGGCATTGT 4 380 

CTTTTTAGGA TCATTAAGAA ATGATGTTGA ACGCGTTCGA GAAATTTTAG ACTTACCTGA 444 0 

50 

CTATGTCTTC CCGGTATTTG GTATGGCAGT AGGGG AACCc GCAGATGACG AAAATGGTGC 4 500 

AGCCAAGCCA CGCTTACCAT TTGACCATGT CTTCCATCAT AATAAGTATC ATGCTGATAA 4 56 0 

GGAAACACAG TATGCACAAA TGGCAGATTA CGACCAGACA AT CAGCGAGT ACTATGATCA 462 0 
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CAAAGCAAGA TTAGATATGT TAGAACAATT GCAAAAATCA GGCTTAATAC AGCGATAgCA 4740 

AGATACCAAA ATAACCCGCC CCCCTCTAGC TTAAAATGAT AAGTATAGCT AGAGGGGGCG 4 800 

5 GGTATTTCTT GCAATGAATT AGTGTGAAGT TAATGCAGCA TTATCATTTG AATCGAAAGT 4 860 

ATCTTTATCC CAATGTTTAG TTAACTTGGC GGTACCTGTA CCAGCTAGCA TTGAATCGTT 4 920 

CACGTTTAAT GCTGTTCTAC CCATGTCAAT CAATGGTTCA ACGGAGATGA GCACGCCGGc 4 980 

10 

TAAAGCGACT GGCAAGTTTA ACGTTGACAA CACCAATATG GATGCAAATG TAGCCCCGCC 5040 

ACCGACGCCA GCAACGCCGA ATGAACTAAT AATCACGACA GCGATTAACG TTACAATAAA 5100 

is TTGTAAATCA ATTTCTACAT TAGCGACGGG TGCGACCATA ATTGCAAGCA TGGCAGGGTA 5160 

AATGCCTGCA CAACCATTTT GTCCAATCGA CAATCCAAAT GTCGCAGCGA AATTGGCAAT 5220 

ACCTTCTGGC ACGCCTAGAC GTCTTGTTTG TGTTTGTACA TTCAATGGTA AGGCACCCGC S280 

20 GCTTGAGCGT GATGTGAATG CAAAGATTAA TACTTCCAAA GTCTTTTTAA CATAGCGAAT 5340 

TGGGCTAATA CCTAACAGGC TTAAAATAAT TAAGTGAATG ATATACATCG TAATTAATGC 5400 

AGCGTACGAT GCGATTAAGA ATTTTCCTAA AGTCCAAATG GCGCCAAAGT CACTTGTCGA 5460 

2S 

TAATGTGTTG GCCATAATTG CTAATACACC GTATGGCGTT AAACGTAAGA CGAACGTCAC 5520 

AATCGCCATT ACTAGTGAAT AGATAGCGTC AATCGCACGC TTAAGCAATT CACCATGAT C S580 

AGGTTGTTTG CGTnTACGCG TAAATAAGCA AATCCTATAA ACGAAGCAAA TAT CACGACA 564 0 

30 

GCAATCGTGG aAGTTGCACG TTGTCCaGTG AAATCTAAGA ATGGATTTTT AGGCAATAAT 5700 

TCCAAAATTT GTTGTGGTAA CGTATGTGCT GTTAAATCTT TCGCTTGTTT AGCAATTTCG 576 0 

35 CTTCCACGTG CTTGTTCAGC GTTACCAAGG TTAATTGTTG ATGCATCTAA ACCAAACACC 5820 

AAGGCATACA CAACACCAAC AATCGCAGCA ATGGTGACAG TGCCAATTAA AAAGATAAAA 5880 

ATGASACTAC CAATTTTAGC AAACTTTTCT CCGATTTGAA TTTTAGTGAA TGCAGCTACA 5940 

40 ATAGAAATGA AAATTAAAGG CAT AACAAT C ATTTGCAACA ATGCAACGTA ACCTTGTCCG 6000 

ACAATGTTGA ACCAGTCACT TGTTGATGTA ATAACATTCG AATGTGTGCC ATAAATAAGA 6060 

TGCAATAACA CACCGAATAC TATACCAATC CCTAAAGCTG TAAACACACG TTTCGCAAAA 6120 

45 

GATATATGTT TGCGAGCCAT CATGTGCAAT ATTACGATGA AAATCACCAA TACAATAATA 6180 

TTAATCAGTG TAAGAAAAGC ATTCATGAAC GTCACTCCTT AAATTTTTGA ATATAATTCC 6240 

so GACTAGTATG CT 6252 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 673 0 base pairs 
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(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

s 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 



10 



is 



20 



25 



ATCAAATCnC 


AAAATATTTA 


TTAATnAnAA 


GGGGATTATC 


CaTGTgAGAA 


ACAAAGTAAT 


60 


GCTCTTTTTT 


TACCTCTTGT 


GGGTTGAAAA 


aTGGATCATC 


AGAGATAGAC 


TTCTTCTTTT 


120 


TCGAAGATGA 


CATTTGATAC 


TTTAATCTTC 


TAAAACCATA 


ACTTGTCGCA 


TCAAAAATGC 


180 


CTTCTTGTAC 


AAGTAAAATC 


AAAAATATGC 


TAATAAAAAT 


AATTAATGAA 


ACATAAAACA 


240 


ATATATTTAA 


ATATGTAATG 


ATAGTATGGC 


TATTAAAAAG 


C CAT AT AATA 


AACGTTAATA 


300 


TTGGCGTTAT 


TAGTGCCATT 


CCAAGCCATT 


TTTTCAACAT 


TTGATCACTC 


CCACTTATAG 


360 


AAAACTCTTA 


CGCATAGTTT 


ACATTAAAAT 


CAGACATTGA 


GG AATGATTT 


TTTAATTTCT 


420 


TCAGCTTTAT 


TGAAATTCTA 


AAATCAATCA 


TTCTTCATTA 


GTTTAAAGCA 


AAAAAATATT 


480 


GATATATAGT 


AAATATTGTA 


TATATAATAT 


TAGTTAAGAT 


TTCaGAAAAT 


TTTGAAGGGA 


540 






r\L x iuii i/\un 


GGAGGGGATT 


AGATGGGGAA 


ATATATTTTC 


600 


AAACGATTTA 


TTTATATGCT 


TATTTCTTTA 


TTTATTATTA 


TTACAATTAC 


ATTTTTCTTA 


660 


ATGAAATTAA 


TGCCAGGTTC 


GCCATTTAAC 


GATGCTAAAT 


TAAATGCTGA 


ACAAAAAGAA 


720 


ATTTTAAATG 


AAAAATATGG 


ATTAAATGAT 


CCTGtAGCTA 


CGCAgTATTT 


ACATTATTTA 


780 


AAAAATGTTG 


TTACAGGCGA 


TTTTGGTAAT 


TCATTCCAGT 


ATCATAATCA 


ACCTGTGTGG 


840 


GATTTGATTA 


AACCGAGACT 


ACTACCTTCT 


TTTGAAATGG 


GTCTTACAGC 


AATGTTCaTC 


900 


GGTGTGATAC 


TGGGACTTAT 


TTTAGGTGTT 


GCAGCAGCTA 


CTAAACAAAA 


TTCTTGGGTT 


960 


GACTATACAA 


CTACAGTTAT 


TTCAGTTATT 


GCAGTATCTG 


TACCATCTTT 


TGTACTTGCT 


1020 


GTACTTTTAC 


AATATGTATT 


TGCAGTTAAA 


TTAAGATGGT 


TCCCAGTAGC 


TGGATGGGAA 


1080 



40 

GGTTTTTCGA CCGCGGTATT ACCGTCACTT GCATTATCTG CAGCTGTTTT AGCAACTGTC 1140 

GCCAGATACA TAAGAGCAGA GATGATAGAG GTATTAAGTT CAGACTATAT TTTATTAGCG 1200 

45 AGAGCTAAAG GTAATTCGAC AATGCGTGTA CTTTTTGGAC ATGCACTTAG AAATGCTTTA 1260 

ATTCCAATTA TTACAATTAT CGTTCCCATG TTAGCAAGTA TTTTAACAGG CACTTTAACA 1320 

ATTGAAAATA TTTTTGGAGT TCCTGGATTA GGGGATCAAT TCGTACGTTC AATTACAACA 13 80 

50 AATGATTT CT CAGTAATCAT GGCAATCACA CTATTATTTA GCACACTGTT TATCGTTTCT 1440 

ATTTTTATTG TAGATATTTT GTACGGTGTG ATAGATCCAC GAATTCGTGT TCcAAGgAGG 1500 

TAAAAAATAA TGGCTGAAAA TAAAAACAAT TTGTCGATTA ACGACGATCA TTCTAATGCA 1560 
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10 



20 



25 



TGAATCAGGA ACCTGAAATG CAACGAGAAA GCAAAAACTT TTGGCAAGAT GCTTGGGCTC 1680 

AGTTAAAACG AAATAAGTTA GCTGTTGTCG GTATGATAGG TTTAATTATC ATTGTAATAT 1740 

TTGCTTTTAT CGGTCCAGTT ATAAATAAAC ATGATTATGC TGAACAAAAT GTAGAACATA 1800 

GAAATCTTCC GGCAAAAATA CCTGTATTAG ACAAAGTTCC ATTTTTACCT TTTGATGGTA 1860 

AAGATGCAGA TGGCAAGGAT GCTTATAAAG CAGCAAATGC TAAAGAAAAT TATTGGTTTG 1920 

GTACTGATCA GTTGGGTCGA GATTTATGGA CAAGAACATG GAAAGGTGCT CAAATTTCAT 1980 

TGTTTATCGG TGTTGTTGCA GCGATGTTAG ATATTTTTAT TGGTGTTGTA TATGGTGCGA 2040 

15 TTTCTGGATT CTTCGGTGGA CGTGTCGATA CGATTATGCA ACGTATACTT GAAGTCATAG 2100 

CATCTATTCC GAATTTAATT GTCGTAATTT TATTTGTATT AATTTTTGAA CCATCCATTT 2160 

GGACAATTAT ATTGGCTATG TCTATCACAG GCTGGTTAGG CATGAGCAGA GTTGTACGTG .2220 

GAGAATTTTT AAAATTAAAA AATCAAGAGT TTGTCATGGC TTCGAAAACA TTGGGGGCTT 2280 

CAAAATTCAA ATTGATATTT AAGCATATTT TACCTAATAC ATTAGGTGCT ATCGTGGTTA 2340 

CATCAATGTT TACAGTACCT AGTGCTATTT TCTTCGAAGC ATTTTTAAGT TTCATTGGTA 2400 

TAGGTGTACC CGCACCTCAA ACATCGTTAG GGTCATTAGT AAATGATGGG CGCGCAATGT 2460 

TATTAATTTA TCCACATGAA TTATTTATAC CAGCAATGAT TTTAAGTTTA TTAATTCTAT 2520 

30 TCTTTTACTT ATTTAGTGAT GGATTACGTG ATGCATTTGA TCCGAAAATG CGTAAATAAA 2580 

AAGGGGG CAT AGCATATGAC TGAAAGAATA TTAGAAGTAA ATGATTTGCA TGTTTCCTTT 2640 

GATATTACAG CAGGGGAAGT GCAGGCAGTG AGAGGCGTAG ATTTTTATTT GAACAAAGGG 2700 

35 GAAACATTGG CAATTGTTGG TGAATCAGGT TCAGGTAAAT CTGTAACAAC AAAAGCAATT 2760 

ACAAAATTAT TCCAAGGGGA CACAGGAAGA ATTAAAAAGG GAGAAATTTT ATTTTTAGGG 2820 

GAAGATTTAG CAAAAAAACC TGAAAATGAG TTGATTAAAT TACGTGGCAA AGATATTTCA 2880 

ATGATCTTTC AAGATCCAAT GACATCTTTA AACCCAACGA TGCAAATTGG TAAACAAGTC 2940 

ATGGAACCAT TAATTAAGCA CAAAAATTAT AGTAAAGCAC AAGCTAAAAA GCGCGCATTG 3000 

GAAATACTAA ATCTTGTAGG TTTACCAAAT GCAGAAAAAA GATTTAAAGC ATATCCTCAT 3060 

CAATTTTCAG GTGGACAAAG GCAAAGAATT GTTATTGCAA CCGCATTAGC TTGTGAACCT 3120 

AAAGTGCTCA TTGCTGATGA ACCAACGACT GCATTAGACG TAACGATGCA GGCACAAATT 3180 

60 TTAGATTTAA TGAAAGAACT ACAACAAAAA ATCGATACAG CAATTATTTT TATAACGCAT 3240 

GATTTAGGGG TTGTTGCGAA TATTGCTGAT AGAGTGGCAG TTATGTATGG TGGTCAAATG 3300 

GTTGAAACAG GAGATGTTAA CGAAATATTT TATGATCCAA AGCATCCATA TACATGGGGA 3360 

£5 



40 



45 



397 



